That would allow to recursively compare directories in tests using
"diff -r" on Windows in a similar way as it can be done on Linux or Mac.
Details
Diff Detail
- Build Status
Buildable 13589 Build 13589: arc lint + arc unit
Event Timeline
I haven't looked too closely, but I think this test might fail:
RUN: mkdir -p %t/A %t/B RUN: touch %t/A/only_in_a %t/B/only_in_b RUN: not diff -r %t/A %t/B
If there are tests for lit itself, it might be useful to check this one in.
That's a great point! I should add names for non empty files as well! Also will look for TestRunner tests, thanks Vedant!
utils/lit/lit/TestRunner.py | ||
---|---|---|
378 | Does os.walk have well-defined ordering? What if contents of dir1 are visited in a different order than dir2? | |
386 | Do we also want to compare file names? E.g., what if we have dir1/a # Contents: 12345 dir2/b # Contents: 12345 Won't this implementation consider dir1 and dir2 to be the same? |
Since a user of lit is going to expect diff -r to "just work", then it needs to match the semantics of what the tool diff -r actually does. The patch here will print "something" and return an error code if there are differences, but it won't print the same thing as the actual diff utility when used with -r. I think if we implement this in lit, we need to make the output look the same. For example, a person might try to use this to verify that two directories differ only in one file. (for example, one has an a.txt and another has a b.txt). So they might run diff -r and filecheck the output against this:
Only in d:\foo: a.txt Only in d:\bar: b.txt
But the code here won't print that. In fact, if the *contents* of a.txt and b.txt are the same but just the names are different, then I think (from inspecting this patch) that the diff would actually succeed, since it just appends file contents to a buffer.
So I think if we want to implement diff -r here, you need to make sure it exactly matches the output of the unix tool for various inputs.
Implement the same output as produced by regular diff utility + added tests for that.
Please take a look :)
utils/lit/lit/TestRunner.py | ||
---|---|---|
378 | Resolved now, as I put filepaths and contents into a dict. | |
utils/lit/tests/Inputs/shtest-shell/diff-r.txt | ||
27 | I was hoping to use FileCheck --check-prefix and different prefixes, but for some reason I'm getting weird errors, e.g.: ... $ "not" "diff" "-r" "/usr/local/google/home/mmoroz/Projects/llvm/build/utils/lit/tests/Inputs/shtest-shell/Output/diff-r.txt.tmp/dir1" "/usr/local/google/home/mmoroz/Projects/llvm/build/utils/lit/tests/Inputs/shtest-shell/Output/diff-r.txt.tmp/dir2" $ "FileCheck" "--check-prefix=ERROR1" "/usr/local/google/home/mmoroz/Projects/llvm/build/utils/lit/tests/Inputs/shtest-shell/diff-r.txt" # command stderr: /usr/local/google/home/mmoroz/Projects/llvm/build/utils/lit/tests/Inputs/shtest-shell/diff-r.txt:27:11: error: expected string not found in input # ERROR1: Only in %t/dir1: dir1unique ^ <stdin>:1:1: note: scanning from here Only in /usr/local/google/home/mmoroz/Projects/llvm/build/utils/lit/tests/Inputs/shtest-shell/Output/diff-r.txt.tmp/dir1: dir1unique ^ <stdin>:1:106: note: possible intended match here Only in /usr/local/google/home/mmoroz/Projects/llvm/build/utils/lit/tests/Inputs/shtest-shell/Output/diff-r.txt.tmp/dir1: dir1unique ^ error: command failed with exit status: 1 I'm fine to proceed with grep, if reviewers don't mind. |
utils/lit/tests/Inputs/shtest-shell/diff-r.txt | ||
---|---|---|
27 | You can't use substitutions like %t in a check prefix. Those are expanded only in run lines. The general solution for this is to wrap that part in a regex. So for example, instead of this: ERROR1: Only in %t/dir1: dir1unique You would write this: ERROR1: Only in {{.*}}dir1: dir1unique That said, I would really prefer if we could use the check lines instead of grep, since that's the canonical way of writing tests and there are other advantages as well (CHECK-NEXT, etc) |
utils/lit/tests/Inputs/shtest-shell/diff-r.txt | ||
---|---|---|
27 | That totally works! Thank you, Zachary! I've tried that before as well, but I put string into double-quotes, so the regex didn't work for me neither :) |
utils/lit/lit/TestRunner.py | ||
---|---|---|
383–385 | Would it be more efficient to just save off the paths instead of reading the lines up front? You can imagine a case where a folder has a ton of files, but the other folder has none of those files. Reading all those lines up front and making a strong reference to them is going to consume a huge amount of memory. But you might not even need the contents of the file because the file might not exist in the other folder anyway, in which case you can early out on it. It seems like all you need to do is build up a tuple like (dirname, child_dirs, files), where child_dirs is itself a list of tuples of the exact same form. But don't read the contents until you know you have to do the comparison, and throw the contents away before you do the next comparison, to guarantee your'e not loading an entire directory tree's worth of file contents into memory at once. | |
395 | s/compate/compare/ | |
418 | s/compate/compare/ | |
424–427 | This looks a little strange to me. stdout.write doesn't add a newline does it? Wouldn't this mash everything together on a single line? Any reason not to use print here? | |
432 | s/compate/compare/ |
By the way, I'm having some troubles running shtest-shell on Linux. Is there any proper way to run it, or maybe we have a test bot that can do it for me? In the worst case I can try that on Windows, but it would take some extra time...
This looks good to me. One minor suggestion is to make sure this works with Python 3. If you're already confident it does, you don't have to do anything. But we do have at least one bot running Python 3, and i've been hit with Py3 specific errors before and had to fix it while the bot was broken. Either way, up to you. Thanks for working on this!
Someone else will need to answer this one, as I have really no idea :-/ Sorry
Thanks for the review and for the heads up. I've spent enough time trying to set up a work environment on Windows, but still can't say that it works fine. I'll land this and see how it goes on the bots.
- For some reason, it appears to be important to use stdout.write instead of print, so replacing these.
- Commands implemented in TestRunner.py can't be used as part of a pipeline, this is why there are several diff-error-* tests. I don't see a better way to test negative scenarios rather than split my test into diff-r-error-* and check their output in shtest-shell.py as it's done for existing tests
Does os.walk have well-defined ordering? What if contents of dir1 are visited in a different order than dir2?