This is an archive of the discontinued LLVM Phabricator instance.

Fix handling of backslashes in Args parsing
ClosedPublic

Authored by labath on Feb 24 2015, 3:09 AM.

Details

Summary

Presently Args::SetCommandString allows quotes to be escaped with backslash. However, the
backslash itself is not removed from the argument, nor there is a way to escape the backslash
itself. This leads to surprising results:

"a b" c" -> 'a b', 'c' # Here we actually have an unterminated quote, but that is ignored
"a b\" c" -> 'a b\" c' # We try to escape the quote. That works but the backslash is not removed.
"a b\\" c" -> 'a b\\" c' # Escaping the backslash has no effect.

This change removes the possibility to escape the terminating quotes, making every quote run from
one quote character to the next one. The contents for the quote are always taken literally, just
like the single quotes in a posix shell. The only way to insert a literal quote character is to
close the quote and then escape it with a backslash or to use a different kind of quotes.

To summarize:
"a b" c" -> 'a b', 'c' # unterminated quotes are still ignored
"a b"\"" c" -> 'a b" c' # first way to insert a literal quote
'a b" c' -> 'a b" c' # second way to insert a literal quote
"a b\" c -> 'a b\', c # backslash has no effect, we still get two arguments
"a b\\" c -> 'a b\\', c # same here

This change also removes some dead quote-handling code. This code (branching on the value of
quote_char) was never executed since all quote handling was concentrated in the case '"': block,
and the value was always set to 0 after exiting the block.

Diff Detail

Repository
rL LLVM

Event Timeline

labath updated this revision to Diff 20587.Feb 24 2015, 3:09 AM
labath retitled this revision from to Fix handling of backslashes in Args parsing.
labath updated this object.
labath edited the test plan for this revision. (Show Details)
labath added a reviewer: clayborg.
labath added a subscriber: Unknown Object (MLST).

Can you add a test with the examples you summarized? What does a\b generate
with this change?

The handling of the backslash+random character sequence is unchanged. If the character is not recognised, the backslash is not removed
\b -> "\b"

k_escapable_characters defines which characters have the backslash removed. Currently it is: space, tab, backslash, single and double quote.

I will add some tests tomorrow.

Thanks. If you're going to add tests anyway, may I suggest one more for the
case just mentioned (backslash + letter).

Is SBArgs exposed through python? If not it might need to be a gtest

clayborg accepted this revision.Feb 24 2015, 11:15 AM
clayborg edited edge metadata.

You can always test this by just launching a process that prints out the arguments and parse the STDOUT to verify they got parsed correctly.

(lldb) process launch -- "a b\" c"

Then verify you got 2 args, where arg[1] == 'a b" c'

This revision is now accepted and ready to land.Feb 24 2015, 11:15 AM

Greg, a quick question as your example seems to indicate the opposite behavior of what I propose here:
a) current code: "a b\" c" -> 'a b\" c'
b) my patch: "a b\" c" -> 'a b\' 'c' (aka bash single quote)
c) your example: "a b\" c" -> 'a b" c' (aka bash double quote)

Which behavior do we want? I am fine with (b) or (c), but I found (a) confusing, hence the patch. We could even go more bash-like and have " work like (b) and ' like (c).

Or do you not care? :)

sh-3.2$ ./a.out "a b\" c"


Arguments

argv[ 0] = './a.out'
argv[ 1] = 'a b" c'

So "/bin/sh" on MacOSX seems to think case C is the correct one. Note the single quotes are not part of the argument, they are just used to dump the arguments in case they have spaces.

labath updated this revision to Diff 20760.Feb 26 2015, 7:06 AM
labath edited edge metadata.

I have taken your last comment as an expression of desire to do shell like quoting (which I
preferred anyway), so I have made a new version of the patch which tries to emulate that. In
double quotes, backslash can be used to escape things and is removed from the argument. In single quotes, the backslash is not special and is preserved. This has the added benefit that copy pasting program arguments from shell to "process launch" will most likely "just work".

I have also taken the liberty to modernize some of the code with StringRefs.

labath updated this revision to Diff 20761.Feb 26 2015, 7:15 AM

Added some comments to helper functions.

clayborg requested changes to this revision.Feb 26 2015, 9:46 AM
clayborg edited edge metadata.

Remove the function:

void
Args::SetCommandString (const char *command, size_t len)

now that we have a llvm::StringRef based one.

include/lldb/Interpreter/Args.h
124–126 ↗(On Diff #20761)

You can remove this function now that we have a llvm::StringRef version.

source/Interpreter/Args.cpp
147–151 ↗(On Diff #20761)

Remove this function now that we have a llvm::StringRef version.

This revision now requires changes to proceed.Feb 26 2015, 9:46 AM
labath updated this revision to Diff 20840.Feb 27 2015, 2:43 AM
labath edited edge metadata.

Remove the const char * + length variants of SetCommandString and Args constructor in favor of
the StringRef versions.

clayborg accepted this revision.Feb 27 2015, 9:38 AM
clayborg edited edge metadata.
This revision is now accepted and ready to land.Feb 27 2015, 9:38 AM
This revision was automatically updated to reflect the committed changes.