Bugzilla #47579: if you invoke clang on Windows via a pathname in
which a quoted section closes just after a backslash, e.g.
"C:\Program Files\Whatever\"clang.exe
then cmd.exe and CreateProcess will correctly find the binary, because
when they parse the program name at the start of the command line,
they don't regard the \ before the " as having any kind of escaping
effect. This is different from the behaviour of the Windows standard C
library when it parses the rest of the command line, which would
consider that \" not to close the quoted string.
But this confuses windows::GetCommandLineArguments, because the
Windows API function GetCommandLineW() will return a command line
containing that \" sequence, and cl::TokenizeWindowsCommandLine will
tokenize the whole string according to the C library's rules. So it
will misidentify where the program name stops and the arguments start.
To fix this, I've introduced a new variant function
cl::TokenizeWindowsCommandLineFull(), intended to be applied to the
string returned from GetCommandLineW(). It parses the first word of
the command line according to CreateProcess's rules, considering \ to
never be an escaping character; thereafter, it switches over to the C
library rules for the rest of the command line.
In the case where there are no further " on the command line, the
previous code was also managing to return no arguments at all, causing
an assertion failure later on in GetCommandLineArguments when it tried
to refer to Args[0]. That occurred because the tokenizer was throwing
away the final token if it contained a quoted string still unclosed at
the end of the command line, which also doesn't match the Windows CRT
handling. So I've fixed that too, and for extra safety, included a
final check before dereferencing Args[0], so that just in case we
still somehow get a zero-length argument vector, we'll at least
not crash.
This seems like it could be a bug when the command line ends with an open quoted string. We should match the CRT logic. In the original example, the entire command line would become argv[0].