Page MenuHomePhabricator

Fix llvm-objcopy when directory contains "bar".
AbandonedPublic

Authored by jlebar on Jan 7 2020, 12:07 PM.

Details

Summary

My home directory, "/Users/jlebar", appears in the output of many tools.
Tests which naively check that "bar" does not appear in a tools' output
will fail for me.

I've "fixed" two llvm-objcopy tests here. It's pretty hacky, and if
anyone has "barrr" in their username, they'll hit the same problem.

Event Timeline

jlebar created this revision.Jan 7 2020, 12:07 PM
MaskRay added inline comments.Jan 7 2020, 12:19 PM
llvm/test/tools/llvm-objcopy/ELF/only-section-strip-undefined.test
11

# BAR: Name: bar

rupprecht added inline comments.Jan 7 2020, 12:45 PM
llvm/test/tools/llvm-objcopy/ELF/only-section-strip-undefined.test
6

I assume the match you're getting is the first line from llvm-readobj:

File: /Users/jlebar/path/to/something.o
Format: ELF64-x86-64
Arch: x86_64
AddressSize: 64bit

Piping from stdin will remove that filename; can you instead try:

# RUN: llvm-readobj --symbols < %t2.o | FileCheck %s --implicit-check-not=bar
MaskRay added inline comments.Jan 7 2020, 1:53 PM
llvm/test/tools/llvm-objcopy/ELF/only-section-strip-undefined.test
6

--implicit-check-not='Name: bar'

6

Using < file input redirection may leave us in a very inconvenient situation. Maybe we should just be more careful when using --implicit-check-not, instead of replacing every single use of llvm-readobj %t with llvm-readobj < %t.

6

I'd like to hear @jhenderson what wants to say here.

rupprecht added inline comments.Jan 7 2020, 2:02 PM
llvm/test/tools/llvm-objcopy/ELF/only-section-strip-undefined.test
6

When you say it's inconvenient, can you clarify why?

The practice of using piping to avoid the filename present is documented here, albeit for a different tool (opt instead of llvm-readobj): http://llvm.org/docs/TestingGuide.html#fragile-tests

jhenderson added inline comments.Jan 8 2020, 12:56 AM
llvm/test/tools/llvm-objcopy/ELF/only-section-strip-undefined.test
6

This isn't likely to affect too many people, but if a test redirects from stdin, it is a pain to rerun a part of it in my default natural environment, Windows PowerShell, because PowerShell doesn't support the '<' redirection command for some unknown reason.

MaskRay's suggestions in this test would seem more preferable to me therefore, and less fragile for other reasons too, I reckon: "Name: bar" is much less likely to appear anywhere in the output beyond the bit being checked. Also, "bar" is a common enough three-letter pattern to appear in another word that even shell redirection won't necessarily solve it in all cases or could prove flaky in the future if more output is added to llvm-readobj.

MaskRay added inline comments.Jan 10 2020, 1:40 PM
llvm/test/tools/llvm-objcopy/ELF/only-section-strip-undefined.test
6

@rupprecht Do we have a consensus now?

rupprecht added inline comments.Jan 24 2020, 1:21 PM
llvm/test/tools/llvm-objcopy/ELF/only-section-strip-undefined.test
6

This isn't likely to affect too many people, but if a test redirects from stdin, it is a pain to rerun a part of it in my default natural environment, Windows PowerShell, because PowerShell doesn't support the '<' redirection command for some unknown reason.

There seems to be thousands of llvm-based tests already that use use < as a pipe. It's a shame PowerShell doesn't support it.
PowerShell does seem to have a replacement called Get-Content. I wonder if we could have lit stdout rewrite the RUN commands so "foo < x y z" is printed out as "Get-Content x | foo y z"?

$ find . -type f -path '*test*' -print0 | xargs -0 grep -l 'RUN.*llvm-.*<' | wc -l
3704

PowerShell does seem to support the pipe operator, so cat %t2.o | llvm-readobj --symbols might also work, although I only see one test that uses cat this way. (I see lots of cat ... | FileCheck, but nothing piping into a tool). I think if we want to support PowerShell we should do something about the thousands of tests and the documentation using PowerShell-incompatible features.

Anyway... I'm still slightly leaning toward RUN: llvm-readobj --symbols < %t2.o because (1) it matches what the rest of llvm testing does, and (2) it actually prevents full filenames from leaking into the tool output, so you don't have to play games with more advanced FileCheck options. (There is a risk that the filename might find some other way to creep in, but I think that's mostly theoretical).
OTOH, I also don't want these tests to be failing for users based on their username, so either patch is better than nothing & perfect should not be the enemy of good and all that, so if nobody else likes < then feel free to submit whichever patch you like.

dblaikie added inline comments.
llvm/test/tools/llvm-objcopy/ELF/only-section-strip-undefined.test
6

Yeah - please stick with "<" given the precedent in the project already. If there's a systemic issue with that that could be addressed by a similarly legible alternative or improvements to Lit, I think that's worth discussing separately.

(though it's not clear to me if @MaskRay's "Using < file input redirection may leave us in a very inconvenient situation. " refers to the issue that @jhenderson mentioned, or was alluding to some other, as yet unclarified, inconvenience - if there's some other unknowns here, happy to discuss them - though, again, given the precedent, please go ahead with using "<" (if it works as well here as in the many other tests it's used in) and start a separate llvm-dev discussion, perhaps, on addressing those other issues)

MaskRay added a subscriber: bbaren.Jan 24 2020, 2:05 PM
MaskRay added inline comments.
llvm/test/tools/llvm-objcopy/ELF/only-section-strip-undefined.test
6

I should have been clearer. Some GNU binutils (nm, objdump, size) have a default file name: "a.out", when no filename is specified. llvm-nm, llvm-objdump, llvm-size and llvm-dwarfdump learned this rule from GNU binutils. I have a weak preference not doing this, but I don't have a strong enough motivation to change it, and I feel someone may report bugs if LLVM binary utilities don't have the default file name.

For such utilities, llvm-nm < %s will not read from stdin. If we want to change some utilities (llvm-objcopy in question) to follow the convention of traditional LLVM tools (llvm-as/llc/opt/etc), the aforementioned tools will inevitably cause inconsistency which cannot be fixed.

A balance between "don't let users report bugs" and "don't let such behavior get in our way" is not changing our current practice, but making the tests a bit more robust (D72358). By doing that, we can circumvent the problem we encountered by @jlebar and @bbaren. I did worry whether that would bring more mental burden when writing tests, but from my experience in D72358 it does not look that bad.

dblaikie added inline comments.Jan 24 2020, 2:24 PM
llvm/test/tools/llvm-objcopy/ELF/only-section-strip-undefined.test
6

If any of our tools have such default behavior and need it, they should have that behavior - our testing strategy shouldn't dictate feature usability.

But I'd still advocate for the same testing strategy using such a feature, using "toolname - < inputname" to tell the tool to read from stdin, then pipe the file into stdin. The existence of a "no file name means a.out" doesn't change the general approach, in my opinion.

I'd rather not circumvent only the current problem - all sorts of other substrings may appear in a user's path that might confuse tests & we should avoid having the path in the test at all. (yeah, we don't do this for all tests - but I think when it comes up it's worth fixing firmly & avoiding the input name appearing anywhere in the checked output)

MaskRay added inline comments.Jan 24 2020, 2:36 PM
llvm/test/tools/llvm-objcopy/ELF/only-section-strip-undefined.test
6
% size - < a
size: '-': No such file

Well, this can be a feature request to GNU binutils.

Doing things differently for llvm-size - < a and llvm-readobj -d < a seem a bit inconvenient.

For consistency, we may need to change every binary utilities use case to either - < a or < a. I acknowledge the filename-in-output problem, but am worried that the proposed fix is causing too much burden.

MaskRay added inline comments.Jan 24 2020, 2:44 PM
llvm/test/tools/llvm-objcopy/ELF/only-section-strip-undefined.test
6

We have 700+ occurrences of llvm-objcopy, nears 4000 occurrences of llvm-readobj in tests. Very few of them have the problem. Adapting every test case to be consistent with 2 llvm-objcopy tests is too big a hammer.

dblaikie added inline comments.Jan 24 2020, 7:52 PM
llvm/test/tools/llvm-objcopy/ELF/only-section-strip-undefined.test
6

I'm not suggesting changing all occurrences - though I am suggesting that piping input is best practice in LLVM for writing input tests (& this extends beyond objcopy and readobj - similarly for LLVM IR tests too, etc - piping input ensures the on-disk path doesn't appear in the output in a number of different tools) & the preferred solution when tests end up having these sorts of problems.

(taking the discussion out-of-line, so that it's easier to read and quote)

I wonder if we could have lit stdout rewrite the RUN commands so "foo < x y z" is printed out as "Get-Content x | foo y z"?

FWIW, I don't think complicating lit to support one specific shell (PowerShell) is that great an idea, given there's a straightforward enough workaround for me (drop down to cmd). It's inconvenient but I can handle it where it is needed (but I wouldn't want it to be more widespread than necessary).

I actually find something like yaml2obj %s | llvm-readobj --symbols - (or similar) more inconvenient, since it means there's no object on disk that can be easily inspected to see what the problem is when maintaining the test. I don't think this is being suggested here though.

Some GNU binutils (nm, objdump, size) have a default file name: "a.out", when no filename is specified. llvm-nm, llvm-objdump, llvm-size and llvm-dwarfdump learned this rule from GNU binutils. I have a weak preference not doing this, but I don't have a strong enough motivation to change it, and I feel someone may report bugs if LLVM binary utilities don't have the default file name.

We shouldn't change the behaviour of tools with regards to a.out. This was attempted last year by @abrachet and reverted after various objections and problems, since there are configure scripts out there that assume the name is that.

Checks like # BAR: bar should be avoided regardless of path issues, as they have too great a chance of matching something else spuriously (there are plenty of words in the English language that have "bar" as a substring). Changing to redirecting via stdin doesn't solve this. Using something like barrr is less of an issue, but providing the context of the Name: field name makes the test more robust and more readable in my opinion. The chance of "Name: bar" appearing in somebody's path is minimal at best, and is even less likely if you add the end of line {{$}} pattern to the check.

jlebar abandoned this revision.Feb 6 2020, 4:13 PM

Thank you for the reviews, and I'm sorry I dropped off the face of the earth here! I was messing with my phab settings and I think I somehow disabled all emails. Oops.

Looks like Ray fixed this in the meantime (D72358), so dropping this change.