This is an archive of the discontinued LLVM Phabricator instance.

Fix JSON compilation database command unescaping.
Needs ReviewPublic

Authored by dcheng on Nov 23 2014, 12:41 AM.

Details

Reviewers
klimek
Summary

The command line argument unescaper now only skips over a backslash if
the next character is a double quote character. This matches the
compilation database spec and the compilation databases generated by
tools like Ninja and CMake.

Diff Detail

Event Timeline

dcheng updated this revision to Diff 16535.Nov 23 2014, 12:41 AM
dcheng retitled this revision from to Fix JSON compilation database command unescaping..
dcheng updated this object.
dcheng edited the test plan for this revision. (Show Details)
dcheng added a reviewer: klimek.
dcheng added a subscriber: Unknown Object (MLST).
dcheng updated this revision to Diff 16540.Nov 23 2014, 12:06 PM

Rename skipEscapeCharacter() to skipEscapeForDoubleQuote().

klimek edited edge metadata.Nov 24 2014, 1:29 AM

Did you abandon the revision on purpose? I'm also not understanding the fix
quite yet... which test did break before?

I abandoned the previous diff, because I was flailing about the interface and had no idea what I was doing.

As for the fix, the JSON compilation database spec (http://clang.llvm.org/docs/JSONCompilationDatabase.html) says:

Parameters use shell quoting and shell escaping of quotes.

The problem is that skipEscapeCharacter() always skips over a backslash character, instead of the behavior described in the spec (only shell escaping of quotes). Ninja and CMake expect that shell escaping is only applied to quotes, and output compile database commands like:

"clang-cl.exe path\\to\\input.cc"

After JSON unescaping, the command looks like:

clang-cl.exe path\to\input.cc

But because the argument parser always skips backslashes, JSONCompilationDatabase::getCompileCommands ends up returning:

clang-cl.exe pathtoinput.cc

Which obviously doesn't work. One alternative is to decide that all normal shell escapes apply and change the spec. This has one big disadvantage: the JSON decoder already turns escape sequences into their special character (e.g. \n to an actual newline character). Why does the argument unescaper need to do the same thing?

Finally, there were no tests for the command argument unescaping behavior, so I modified JSONCompilationDatabase.GetAllCompileCommands to cover several argument parser behaviors:

  • Double quotes allow spaces to be included in a single argument.
  • Escaped quotes are skipped over inside a double quoted argument.
  • Backslashes are included in the unescaped argument if they weren't escaping a quote.
  • Backslashes are allowed as the last character of a single argument.

Also, if you want to update the patch, you don't need to abandon the
revision - if you abandon the revision there is not much more you can do
with it... (and I cannot patch it in and see what you actually are trying
to fix).

The general idea behind the unescaping is that we want to be able to put
shell commands "as if you wanted to pass them to shell" into the "command"
entry...

Ok, then what you plan to do makes sense. Can you open a new revision on
phab (or re-open this one) so I can comment?

Ah, it seems unabandoned again...

klimek added inline comments.Nov 24 2014, 5:32 AM
lib/Tooling/JSONCompilationDatabase.cpp
92

I'd rather not --Position.
To me this would be easier to verify:

if (*Position == '\\' && Position + 1 != Input.end() && *(Position + 1) == '"')
  return next();
return true;
unittests/Tooling/CompilationDatabaseTest.cpp
93

Please test this in the style of the extra test functions below and leave this without testing escaping... The test is large enough as it is.

tm added a subscriber: tm.Nov 24 2014, 8:01 AM

Why limit this to double quote only? According to our specification both '"'
and '\' are special characters, thus both of them should have been escaped when
occurring in context where they have special meaning but literal meaning is
intended.

Consider for example ninja build command that executes /bin/echo with single
backslash as an argument:

$ cat build.ninja 
rule cc
  command = /bin/echo "\\"
build test: cc test.c
$ ninja -t compdb cc
[
  {
    "directory": "/tmp/",
    "command": "/bin/echo \"\\\\\"",
    "file": "test.c"
  }
]
tm added a comment.Nov 24 2014, 9:51 AM

And then quotes, backslash and space character all have special meaning outside
quotes. So probably there should be separate implementation of skipEscape for
unquoted string. Sorry for the noise.

That's an interesting discovery about ninja. On Windows:

D:\>more build.ninja
rule cc

command = /bin/echo "\\" $in

build test: cc dir\test.c

D:\>ninja -t compdb cc
[

{
  "directory": "D:\\",
  "command": "/bin/echo \"\\\\\" dir\\test.c",
  "file": "dir\\test.c"
}

]

So ninja isn't even internally consistent! I'll file a bug against ninja.
That being said, I'm not convinced that it makes sense to handle anything
other than the literal \" specially--we still need to skip processing every
other 'normal' escape sequence, so we may as well limit this special
behavior to the combination of the two.

klimek added a comment.Jul 3 2015, 6:51 AM

Is this still needed given that we'll want to go the separate args route?