This is an archive of the discontinued LLVM Phabricator instance.

Add llvm-echo command.
AbandonedPublic

Authored by ruiu on Oct 24 2016, 5:39 PM.

Details

Summary

"echo" command is not very portable. Particularly on Windows, we have
various types of echo command that interpret backslashes, double-quotes
and single-quotes differently. On windows, shell does not tokenize
command line arguments but each command does, so the interpretation
varies depending on your crt.

As a result, we observed hard-to-fix errors that happened only on a
limited set of Windows buildbots.

This patch adds a portable "echo" command which always interprets
arguments in the Unix style even on Windows. llvm-echo returns the
same string for a string no matter what platform it is running.
This command should fix the compatibility issue.

Event Timeline

ruiu updated this revision to Diff 75663.Oct 24 2016, 5:39 PM
ruiu retitled this revision from to Add llvm-echo command..
ruiu updated this object.
ruiu added reviewers: rnk, inglorion.
ruiu added a subscriber: llvm-commits.

Could you provide a few examples of things that could be solved with this? I agree that we should not be using 'echo' because the command is pretty much unpredictable and each shell interprets it differently. However, I'd rather not reinvent a wheel when we should have e.g. POSIX printf command that's much more sane.

That said, I don't understand why would you need to tokenize arguments to echo.

rnk edited edge metadata.Oct 25 2016, 9:24 AM

That said, I don't understand why would you need to tokenize arguments to echo.

If the kernel gives you argv directly, then you don't need to tokenize. If, as on Windows, the kernel gives userspace a quoted string, then different executables may interpret it differently. We've had problems with shell commands like echo '"' where the way CPython quotes the string isn't compatible with the way MSys echo tokenizes it. See rL284768. We could try to standardize on printf everywhere, but it's hard to enforce that for new tests.

inglorion edited edge metadata.Oct 25 2016, 1:45 PM

I'm happy you're working on this, because writing short strings in lit tests is a common case, and having it not work reliably is painful. That said, I feel we need some way for lit to automatically do the right thing. If we don't have that, people are going to continue to add tests that will work fine on some platforms and mysteriously write the wrong string on other platforms. Since I expect that most people will continue to write echo for this, is the plan to follow this up with a change that makes lit use llvm-echo when people write echo in lit run recipe?

ruiu added a comment.Oct 25 2016, 3:34 PM

It should be easy to run our own "echo" instead of system's echo because I think it's doable just by renaming this executable "echo" as opposed to "llvm-echo". I think the build subdirectory has higher precedence in $PATH in the lit tests, so any command that has the same name as a system command overwrites the existing one. If everybody's fine, I can name this echo instead of llvm-echo.

But if we do that, I think this echo command needs more features. I believe some tests pass command line options such as -n to echo.

rnk added a comment.Oct 25 2016, 3:37 PM

I think we should leave it as llvm-echo and use lit substitutions to prefer our version.

As for adding features, such as the -n option, I would like to make the case for modding those to using printf. echo is non-portable because different implementations have different, and sometimes conflicting features. printf behaves much more predictably and has more features. For this reason, many people advise to use printf instead of echo. Unfortunately, many people's first instinct is to use echo. I think it's a good idea to make sure this works consistently, regardless of what variant of echo may be installed on someone's system or listed first in their PATH.

For the purpose of consistent behavior, I feel it's good enough to have a very simple implementation of echo that just prints its arguments, as your implementation does. I don't see as strong a case for implementing anything more advanced. It would require more work to implement, and we would have to decide if we're going to support -n, and if we're going to support a way to print a literal -n, and whatever we decide would probably surprise someone somewhere. It seems to me we might as well go with the simple implementation and tell people to use printf if they want something more advanced. We have the benefit that if someone tries to use -n to omit the final newline, the simple implementation will very obviously not do what they want, which will hopefully cause them to look for a solution and use printf instead. That, in turn, would then work if someone were to copy-paste the command line from the lit test into their shell, regardless of what variant of echo is on their system. This is much better than what we currently have, where one can use echo to write something that works on one system and misbehaves on another.

ruiu added a comment.Oct 25 2016, 4:34 PM

I'm OK with going with this echo and replace all tests using "echo -n" with printf or something.

But printf is not a solution for the specific problem I'm trying to address. Different printf commands on Windows could tokenize command line arguments differently, so it doesn't matter whether it's printf or echo. The problem is that the Windows command line tokenization rule is weird and inconsistent with command to command.

Right, that's a good point. Of course, we could also implement an llvm-printf command, but that sounds like more effort than just implementing support for -e in llvm-echo. Given that, I'm fine if you want to go that route.

ruiu updated this revision to Diff 75824.Oct 25 2016, 6:22 PM
ruiu edited edge metadata.
  • Use llvm-echo instead of echo for all lit tests.
rafael accepted this revision.Oct 26 2016, 5:21 AM
rafael added a reviewer: rafael.
rafael added a subscriber: rafael.

LGTM

This revision is now accepted and ready to land.Oct 26 2016, 5:21 AM
rnk requested changes to this revision.Oct 26 2016, 8:49 AM
rnk edited edge metadata.
rnk added inline comments.
utils/lit/lit/TestRunner.py
662

lit is supposed to be independent from LLVM. These substitutions are normally done in each project's lit.cfg. If you search for where we add the FileCheck substitution you should be able to do something similar.

utils/llvm-echo/llvm-echo.cpp
18

Let's use llvm::outs() instead of std::cout just for consistency with the rest of LLVM.

33

Why is this TokenizeGNUCommandLine and not TokenizeWindowsCommandLine? Which one works with CPython?

This revision now requires changes to proceed.Oct 26 2016, 8:49 AM
ruiu updated this revision to Diff 75925.Oct 26 2016, 11:58 AM
ruiu edited edge metadata.
  • Reverted a change to TestRunner.py.
  • Use llvm::outs() instead of std::cout.
ruiu marked 2 inline comments as done.Oct 26 2016, 12:01 PM
ruiu added inline comments.
utils/llvm-echo/llvm-echo.cpp
33

Doe it make sense? I'm trying to get the exact same result on all platforms instead of trying to get consistent results only on Windows.

majnemer added inline comments.
utils/llvm-echo/llvm-echo.cpp
29–36

This should probably be a call to Process::GetArgumentVector, it abstracts away this difference.

rnk added a comment.Oct 26 2016, 1:14 PM

I have a better idea, let's fix this in lit. I'll send a patch soon.

utils/llvm-echo/llvm-echo.cpp
33

I think this will do the wrong thing for a command line like:

echo C:\asdf\t.exe

The gnu version will probably eat the backslashes.

rnk added a comment.Oct 26 2016, 1:29 PM

I think https://reviews.llvm.org/D25929 makes this unnecessary.

ruiu abandoned this revision.Oct 26 2016, 1:39 PM