This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
CMakeLists.txt
-
utils/
-
lit/lit/
-
lit/
1/1
TestRunner.py
-
llvm-echo/
-
CMakeLists.txt
1/5
llvm-echo.cpp

Differential D25929

Add llvm-echo command.
AbandonedPublic

Authored by ruiu on Oct 24 2016, 5:39 PM.

Download Raw Diff

Details

Reviewers

inglorion
• rafael
rnk

Summary

"echo" command is not very portable. Particularly on Windows, we have
various types of echo command that interpret backslashes, double-quotes
and single-quotes differently. On windows, shell does not tokenize
command line arguments but each command does, so the interpretation
varies depending on your crt.

As a result, we observed hard-to-fix errors that happened only on a
limited set of Windows buildbots.

This patch adds a portable "echo" command which always interprets
arguments in the Unix style even on Windows. llvm-echo returns the
same string for a string no matter what platform it is running.
This command should fix the compatibility issue.

Diff Detail

Build Status

Buildable 789
Build 789: arc lint + arc unit

Event Timeline

ruiu updated this revision to Diff 75663.Oct 24 2016, 5:39 PM

ruiu retitled this revision from to Add llvm-echo command..

ruiu updated this object.

ruiu added reviewers: rnk, inglorion.

ruiu added a subscriber: llvm-commits.

Herald added subscribers: modocache, mgorny, beanz. · View Herald TranscriptOct 24 2016, 5:39 PM

Could you provide a few examples of things that could be solved with this? I agree that we should not be using 'echo' because the command is pretty much unpredictable and each shell interprets it differently. However, I'd rather not reinvent a wheel when we should have e.g. POSIX printf command that's much more sane.

That said, I don't understand why would you need to tokenize arguments to echo.

In D25929#578441, @mgorny wrote:

That said, I don't understand why would you need to tokenize arguments to echo.

If the kernel gives you argv directly, then you don't need to tokenize. If, as on Windows, the kernel gives userspace a quoted string, then different executables may interpret it differently. We've had problems with shell commands like echo '"' where the way CPython quotes the string isn't compatible with the way MSys echo tokenizes it. See rL284768. We could try to standardize on printf everywhere, but it's hard to enforce that for new tests.

I'm happy you're working on this, because writing short strings in lit tests is a common case, and having it not work reliably is painful. That said, I feel we need some way for lit to automatically do the right thing. If we don't have that, people are going to continue to add tests that will work fine on some platforms and mysteriously write the wrong string on other platforms. Since I expect that most people will continue to write echo for this, is the plan to follow this up with a change that makes lit use llvm-echo when people write echo in lit run recipe?

It should be easy to run our own "echo" instead of system's echo because I think it's doable just by renaming this executable "echo" as opposed to "llvm-echo". I think the build subdirectory has higher precedence in $PATH in the lit tests, so any command that has the same name as a system command overwrites the existing one. If everybody's fine, I can name this echo instead of llvm-echo.

But if we do that, I think this echo command needs more features. I believe some tests pass command line options such as -n to echo.

I think we should leave it as llvm-echo and use lit substitutions to prefer our version.

As for adding features, such as the -n option, I would like to make the case for modding those to using printf. echo is non-portable because different implementations have different, and sometimes conflicting features. printf behaves much more predictably and has more features. For this reason, many people advise to use printf instead of echo. Unfortunately, many people's first instinct is to use echo. I think it's a good idea to make sure this works consistently, regardless of what variant of echo may be installed on someone's system or listed first in their PATH.

For the purpose of consistent behavior, I feel it's good enough to have a very simple implementation of echo that just prints its arguments, as your implementation does. I don't see as strong a case for implementing anything more advanced. It would require more work to implement, and we would have to decide if we're going to support -n, and if we're going to support a way to print a literal -n, and whatever we decide would probably surprise someone somewhere. It seems to me we might as well go with the simple implementation and tell people to use printf if they want something more advanced. We have the benefit that if someone tries to use -n to omit the final newline, the simple implementation will very obviously not do what they want, which will hopefully cause them to look for a solution and use printf instead. That, in turn, would then work if someone were to copy-paste the command line from the lit test into their shell, regardless of what variant of echo is on their system. This is much better than what we currently have, where one can use echo to write something that works on one system and misbehaves on another.

I'm OK with going with this echo and replace all tests using "echo -n" with printf or something.

But printf is not a solution for the specific problem I'm trying to address. Different printf commands on Windows could tokenize command line arguments differently, so it doesn't matter whether it's printf or echo. The problem is that the Windows command line tokenization rule is weird and inconsistent with command to command.

Right, that's a good point. Of course, we could also implement an llvm-printf command, but that sounds like more effort than just implementing support for -e in llvm-echo. Given that, I'm fine if you want to go that route.

Use llvm-echo instead of echo for all lit tests.

LGTM

This revision is now accepted and ready to land.Oct 26 2016, 5:21 AM

rnk requested changes to this revision.Oct 26 2016, 8:49 AM

rnk edited edge metadata.

rnk added inline comments.

utils/lit/lit/TestRunner.py
662	lit is supposed to be independent from LLVM. These substitutions are normally done in each project's lit.cfg. If you search for where we add the FileCheck substitution you should be able to do something similar.
utils/llvm-echo/llvm-echo.cpp
18	Let's use llvm::outs() instead of std::cout just for consistency with the rest of LLVM.
33	Why is this TokenizeGNUCommandLine and not TokenizeWindowsCommandLine? Which one works with CPython?

This revision now requires changes to proceed.Oct 26 2016, 8:49 AM

Reverted a change to TestRunner.py.
Use llvm::outs() instead of std::cout.

ruiu marked 2 inline comments as done.Oct 26 2016, 12:01 PM

ruiu added inline comments.

utils/llvm-echo/llvm-echo.cpp
33	Doe it make sense? I'm trying to get the exact same result on all platforms instead of trying to get consistent results only on Windows.

majnemer added a subscriber: majnemer.Oct 26 2016, 1:10 PM

majnemer added inline comments.

utils/llvm-echo/llvm-echo.cpp
29–36	This should probably be a call to `Process::GetArgumentVector`, it abstracts away this difference.

I have a better idea, let's fix this in lit. I'll send a patch soon.

utils/llvm-echo/llvm-echo.cpp
33	I think this will do the wrong thing for a command line like: echo C:\asdf\t.exe The gnu version will probably eat the backslashes.

I think https://reviews.llvm.org/D25929 makes this unnecessary.

ruiu abandoned this revision.Oct 26 2016, 1:39 PM

Revision Contents

Path

Size

CMakeLists.txt

1 line

utils/

lit/

TestRunner.py

8 lines

llvm-echo/

CMakeLists.txt

3 lines

llvm-echo.cpp

45 lines

Diff 75824

CMakeLists.txt

	Show First 20 Lines • Show All 776 Lines • ▼ Show 20 Lines

	add_subdirectory(lib)			add_subdirectory(lib)

	if( LLVM_INCLUDE_UTILS )			if( LLVM_INCLUDE_UTILS )
	add_subdirectory(utils/FileCheck)			add_subdirectory(utils/FileCheck)
	add_subdirectory(utils/PerfectShuffle)			add_subdirectory(utils/PerfectShuffle)
	add_subdirectory(utils/count)			add_subdirectory(utils/count)
	add_subdirectory(utils/not)			add_subdirectory(utils/not)
				add_subdirectory(utils/llvm-echo)
	add_subdirectory(utils/llvm-lit)			add_subdirectory(utils/llvm-lit)
	add_subdirectory(utils/yaml-bench)			add_subdirectory(utils/yaml-bench)
	add_subdirectory(utils/unittest)			add_subdirectory(utils/unittest)
	else()			else()
	if ( LLVM_INCLUDE_TESTS )			if ( LLVM_INCLUDE_TESTS )
	message(FATAL_ERROR "Including tests when not building utils will not work.			message(FATAL_ERROR "Including tests when not building utils will not work.
	Either set LLVM_INCLUDE_UTILS to On, or set LLVM_INCLDE_TESTS to Off.")			Either set LLVM_INCLUDE_UTILS to On, or set LLVM_INCLDE_TESTS to Off.")
	endif()			endif()
	▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

utils/lit/lit/TestRunner.py

Show First 20 Lines • Show All 647 Lines • ▼ Show 20 Lines	def getDefaultSubstitutions(test, tmpDir, tmpBase, normalize_slashes=False):
else:		else:
substitutions.extend([		substitutions.extend([
('%:s', sourcepath),		('%:s', sourcepath),
('%:S', sourcedir),		('%:S', sourcedir),
('%:p', sourcedir),		('%:p', sourcedir),
('%:t', tmpBase + '.tmp'),		('%:t', tmpBase + '.tmp'),
('%:T', tmpDir),		('%:T', tmpDir),
])		])

		# Use llvm-echo instead of echo because some echo commands on Windows
		# interpret command line arguments differently and thus echoes
		# different strings. (On Windows, command line tokenization is not
		# done by shell but by each command's CRT, so ARGV may vary for the
		# same arguments.)
		substitutions.extend([('^\s*echo\s', 'llvm-echo ')])
		rnkUnsubmitted Done Reply Inline Actions lit is supposed to be independent from LLVM. These substitutions are normally done in each project's lit.cfg. If you search for where we add the FileCheck substitution you should be able to do something similar. rnk: lit is supposed to be independent from LLVM. These substitutions are normally done in each…

return substitutions		return substitutions

def applySubstitutions(script, substitutions):		def applySubstitutions(script, substitutions):
"""Apply substitutions to the script. Allow full regular expression syntax.		"""Apply substitutions to the script. Allow full regular expression syntax.
Replace each matching occurrence of regular expression pattern a with		Replace each matching occurrence of regular expression pattern a with
substitution b in line ln."""		substitution b in line ln."""
def processLine(ln):		def processLine(ln):
# Apply substitutions		# Apply substitutions
▲ Show 20 Lines • Show All 177 Lines • Show Last 20 Lines

utils/llvm-echo/CMakeLists.txt

This file was added.

				add_llvm_utility(llvm-echo llvm-echo.cpp)

				target_link_libraries(llvm-echo LLVMSupport)

utils/llvm-echo/llvm-echo.cpp

This file was added.

				//===- llvm-echo.cpp - The 'echo' command --------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This is an implementation of a portable "echo" command.
				// It tokenizes command line arguments in the Unix style even on Windows.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/StringSaver.h"
				#include <iostream>
				rnkUnsubmitted Done Reply Inline Actions Let's use llvm::outs() instead of std::cout just for consistency with the rest of LLVM. rnk: Let's use llvm::outs() instead of std::cout just for consistency with the rest of LLVM.

				#if LLVM_ON_WIN32
				#include <windows.h>
				#endif

				using namespace llvm;

				int main(int Argc, const char **Argv) {
				SmallVector<const char *, 4> Args;

				#if LLVM_ON_WIN32
				const char *Cmdline = GetCommandLineA();
				BumpPtrAllocator Alloc;
				StringSaver Saver(Alloc);
				llvm::cl::TokenizeGNUCommandLine(Cmdline, Saver, Args);
				rnkUnsubmitted Not Done Reply Inline Actions Why is this TokenizeGNUCommandLine and not TokenizeWindowsCommandLine? Which one works with CPython? rnk: Why is this TokenizeGNUCommandLine and not TokenizeWindowsCommandLine? Which one works with…
				ruiuAuthorUnsubmitted Not Done Reply Inline Actions Doe it make sense? I'm trying to get the exact same result on all platforms instead of trying to get consistent results only on Windows. ruiu: Doe it make sense? I'm trying to get the exact same result on all platforms instead of trying…
				rnkUnsubmitted Not Done Reply Inline Actions I think this will do the wrong thing for a command line like: echo C:\asdf\t.exe The gnu version will probably eat the backslashes. rnk: I think this will do the wrong thing for a command line like: echo C:\asdf\t.exe The gnu…
				#else
				Args.insert(Args.begin(), Argv, Argv + Argc);
				#endif
				majnemerUnsubmitted Not Done Reply Inline Actions This should probably be a call to `Process::GetArgumentVector`, it abstracts away this difference. majnemer: This should probably be a call to `Process::GetArgumentVector`, it abstracts away this…

				for (int I = 1, E = Args.size(); I < E; ++I) {
				if (I != 1)
				std::cout << " ";
				std::cout << Args[I];
				}
				std::cout << "\n";
				return 0;
				}

This is an archive of the discontinued LLVM Phabricator instance.

Add llvm-echo command.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 75824

CMakeLists.txt

utils/lit/lit/TestRunner.py

utils/llvm-echo/CMakeLists.txt

utils/llvm-echo/llvm-echo.cpp

Add llvm-echo command.
AbandonedPublic