This is an archive of the discontinued LLVM Phabricator instance.

Handle consecutive-double-quotes in Windows argument parsing
ClosedPublic

Authored by Sunil_Srivastava on Feb 25 2019, 5:58 PM.

Details

Summary

Windows command line argument processing treats consecutive double quotes
as a single double-quote. This patch implements this functionality.

Fixes PR39506.

Diff Detail

Event Timeline

Sunil_Srivastava added a comment.EditedFeb 25 2019, 8:16 PM

In terms of practical effect, this patch allows users to do, on Windows,

clang ... -DFOO="""ABC"""

on a file having

const char *p = FOO;
rnk accepted this revision.Feb 27 2019, 4:08 PM

lgtm

I experimented with this program to confirm the behavior matches MSVCRT:

#include <stdio.h>
extern "C" const char *GetCommandLineA(void);
int main(int argc, char **argv) {
  puts(GetCommandLineA());
  for (int i = 0; i < argc; ++i) {
    puts(argv[i]);
  }
}

C:\src\llvm-project\build>t.exe -DFOO="""ABC"""
t.exe  -DFOO="""ABC"""
t.exe
-DFOO="ABC"

C:\src\llvm-project\build>t.exe -DFOO=""" A BC"""
t.exe  -DFOO=""" A BC"""
t.exe
-DFOO=" A BC"

C:\src\llvm-project\build>t.exe tok1 -DFOO=""" A BC""" tok3
t.exe  tok1 -DFOO=""" A BC""" tok3
t.exe
tok1
-DFOO=" A BC"
tok3

C:\src\llvm-project\build>t.exe tok1 -DFOO="" A BC""" tok3
t.exe  tok1 -DFOO="" A BC""" tok3
t.exe
tok1
-DFOO=
A
BC" tok3
This revision is now accepted and ready to land.Feb 27 2019, 4:08 PM

Thanks for the lgtm Reid, but unfortunately our internal testing found a case where this path causes trouble.

Specifically ( with same string initialization that I listed)

clang -c -DFOO=\""\\\\\\""\\"\" p.cpp

used to work but it but it fails with this patch. Backslashes need to be recognized specially.

So I will have to update this review after more study and experimentation.

Sunil_Srivastava added a comment.EditedMar 1 2019, 11:19 AM

Hi Reid,

Update:

We did find two cases which 'work' before my proposed patch and will fail after that, but on further study we think they are defective test cases. Our behavior, with this proposed patch will be similar to the Microsoft compiler on Windows hosts.

Still, since these cases will start failing with this patch, I am going to describe them here.


Case 1:

clang -c -DFOO=\""\\\\\\""\\"\" p.cpp

This currently compiles with FOO becoming a string of four backslashes, made of first set of three and the second set of one.

With the proposed patch it will fail. -E will show (with the source file listed above):

const char *p = "\\\"\";

The Microsoft compiler does exactly the same thing.

It currently works because the consecutive-DQs (after six backslashes) terminate the first string and start the next, therefore disappearing themselves. With this patch they will survive on as one DQ.

Case 2:

clang -c """" p.cpp

This currently works because four DQs become two empty strings, further concatenated to become one empty string, which gets filtered out leaving just p.cpp.

With the proposed patch four DQs will become a string of one DQ, which is not a valid argument, so we will get the error:

clang: error: no such file or directory: '"'

In this case, the Microsoft compiler survives with a warning, but only because it allows any unrecognized source file type with just a warning:

$ cl  /nologo -c """" p.cpp
cl : Command line warning D9024 : unrecognized source file type '"', object file assumed
cl : Command line warning D9027 : source file '"' ignored
$ cl  /nologo -c junk p.cpp
cl : Command line warning D9024 : unrecognized source file type 'junk', object file assumed
cl : Command line warning D9027 : source file 'junk' ignored

The clang driver is just more strict about this.


So, in summary, I believe this patch is valid, and if you still stand by your LGTM, I will be happy to check it in.

rnk added a comment.Mar 14 2019, 9:29 AM

I agree with your analysis, thanks!

This revision was automatically updated to reflect the committed changes.