Details
- Reviewers
zturner llvm-commits labath serge-sans-paille - Commits
- rG981e63581a9a: [gdb-remote] Use lldb's portable Host::GetEnvironment() instead of getenv
rL353440: [gdb-remote] Use lldb's portable Host::GetEnvironment() instead of getenv
rLLDB353440: [gdb-remote] Use lldb's portable Host::GetEnvironment() instead of getenv
Diff Detail
Event Timeline
Forgive my ignorance, but what makes getenv unportable? llvm uses in a bunch of places so it can't be that bad... Is it some wide string thingy?
Regardless, using the lldb function for that seems fine, but if I remember correctly, each GetEnvironment call causes us to reparse the entire environment block. So it would be nice to just call it once in this function and then query that object for what we need.
source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp | ||
---|---|---|
1133 | This change is essential others seem not that necessary but nice to have in order to support multiple systems. |
source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp | ||
---|---|---|
1133 | I would prefer to avoid using C-style formatting if possible. LLVM has some nice formatting routines that are safer and easier to understand. While we're here, perhaps we could change char arg_cstr[PATH_MAX]; up at the top of this function to std::string arg_str; Regardless, we can re-write this in a number of ways, such as: arg_str = llvm::formatv("--log-channels = \"{0}\", env_debugserver_log_channels).str(); arg_str = (Twine("--log-channels = \"") + env_debugserver_log_channels + "\"").str(); arg_str = llvm::format("--log-channels = \"%s\"", env_debugserver_log_channels); to get rid of this snprintf call. |
source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp | ||
---|---|---|
1133 | Actually, this shouldn't be done here at all. The assumption here is that the contents of the Args vector will get passed verbatim to the argc+argv of the called program. On unix systems we will do just that, so if you add the quotes here they will be passed the lldb-server, and then the parse there will fail because we will get extra quotes we didn't expect. Since Windows needs special logic in order to ensure this, that logic should be inside windows-specific code, probably ProcessLauncherWindows. That will make all other instances of launching processes with spaces work, not just this particular case. It seems ProcessLauncherWindows uses Args::GetQuotedCommandString to do this job, but this function seems extremely under-equipped for this job. I'd suggest implementing a windows-specific version of this in ProcessLauncherWindows, which will do whatever fixups are necessary to make this happen. |
source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp | ||
---|---|---|
1133 | It is not that applicable for the windows process launcher to determine which entry in the args needs to be quoted unless given very specific flag or option. I think this issue shall be documented and ask user to be responsible for setting the "right" variables. For example, on Windows PS, instead of on MSVC, instead of |
It is not that applicable for the windows process launcher to determine which entry in the args needs to be quoted unless given very specific flag or option.
Why not? Given the argv parsing rules described here https://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments?view=vs-2017, it sounds like it should be possible to create an algorithm doing the reverse mapping.
Something like this ought to do the trick:
for(string: Args) { if (string.contains_either_of(" \t\"") { double_the_amount_of_backslashes_in_front_of_every_quote_char(string); string = '"' + string '"'; } cmdline += " " + string; }
I think the key problem here is to make sure the argument will be treated as a single argument to the process launcher.
To be specific to this case only, could we just provide a quote char to argument log file path and log channels on Windows?
The downside is one more #if is introduced.
#ifdef _WIN32
char quote_char= '"';
#else
char quote_char='\0';
#endif
std::string env_debugserver_log_channels = host_env.lookup("LLDB_SERVER_LOG_CHANNELS"); if (!env_debugserver_log_channels.empty()) { debugserver_args.AppendArgument( llvm::formatv("--log-channels={0}", env_debugserver_log_channels) .str(), quote_char); }
I almost feel like the Args class could be made smarter here. Because all of the proposed solutions will still not work correctly on Linux. For example, why is this Windows specific? Args::GetCommandString() is very dumb and just loops over each one appending to a string. So if your log file name contains spaces on Linux, you would still have a problem no?
Currently the signature of AppendArgument() takes an argument string, and a quote char, which defaults to '\0' which means no quote. But there is only one call-site of that entire function out of many hundreds of other calls to the function, most of which just pass nothing. This means that all of those other call-sites are broken if their argument contains a space (many are string literals though, so it won't be a problem there).
But why don't we just split this into two functions:
Args::QuoteAndAppendArgument(const Twine &arg); Args::AppendArgument(const Twine &arg);
and the first one will auto-detect the quoting style to use based on the host operating system. This way we would fix the problem for all platforms, not just Windows, and we could also use the function to fix potentially many other bugs at all the callsites where we append unquoted arguments.
Can you elaborate on that? I still don't see what is the problem with the solution I proposed.
To be specific to this case only, could we just provide a quote char to argument log file path and log channels on Windows?
The downside is one more #if is introduced.
Yes, but this is not a problem specific to this code. Plenty of code appends strings to the Args class which may contain spaces all of those would be wrong if they make it's way to the windows process launcher, so it's logical to fix the problem there, as it all affected code paths have to go through there and it alone has enough information to make the right decision on how to quote things.
There won't be any problem with space (or any other character for that matter) on linux, because we don't use GetCommandString on linux. We don't use it for two reasons:
a) we don't need to -- we just pass the individual strings in the Args array as an argv vector into execve
b) GetCommandString is fundamentally broken. It uses random quoting rules which aren't even deterministically reversible. It will probably be right in most situations when the string contains spaces, but things will blow up as soon as the string starts to contain quote chars and backslashes.
Currently the signature of AppendArgument() takes an argument string, and a quote char, which defaults to '\0' which means no quote. But there is only one call-site of that entire function out of many hundreds of other calls to the function, most of which just pass nothing. This means that all of those other call-sites are broken if their argument contains a space (many are string literals though, so it won't be a problem there).
But why don't we just split this into two functions:
Args::QuoteAndAppendArgument(const Twine &arg); Args::AppendArgument(const Twine &arg);and the first one will auto-detect the quoting style to use based on the host operating system. This way we would fix the problem for all platforms, not just Windows, and we could also use the function to fix potentially many other bugs at all the callsites where we append unquoted arguments.
I fundamentally disagree with this approach. There's a reason why most call sites do not use the quote_char, and that's because it's impossible to use correctly. The quoting character, and quoting rules are in general not known at the place where we build the Args vector. Specifically, they are not a property of the host system. They're a property of the entity you are about to pass the quoted string to. So you need to quote the string differently depending on whether you are going to pass it to:
- a posix-style syscall like execve
- windows CreateProcess
- the lldb builtin interpreter
- windows cmd.exe
- a posix shell (there might be even variations between the shells here)
There's no way all of this can be encompassed in a single char variable. Using the "host" rules would be the right solution here, but it would be wrong pretty much everywhere else, because you never know if those Args are going to end up being transmitted over the network and executed on a different system with different rules.
That's why I think the only reasonable solution here is to ignore the quote char here like everyone else is doing already (we should eventually remove it). Instead we should have separate functions like QuoteForCreateProcess, QuoteForLLDBInterpreter, ... (I don't care whether they live in the Args class or not), and have each consumer responsible for picking the right one. So, ProcessLauncherWindows would call QuoteForCreateProcess, which would apply any transformations necessary, pass the result to CreateProcess and be done.
For example, for a Args vector like lldb-server, gdb-remote, --log-channels=foo\\\ \\\""" ''', whatever, QuoteForCreateProcess would return
lldb-server gdb-remote "--log-channels=foo\\\ \\\\\\\"\"\" '''" whatever (there are other ways to quote this too). Passing this string to CreateProcess will result in the original vector being available to the main function of lldb-server, which is what this code (and all other code that works with the Args class) expects.
Btw, there is already code for doing this in llvm (llvm::sys::flattenWindowsCommandLine), so we can just steal the implementation from there.
What do you think of the following codes to be added in Args?
bool Args::GetFlattenQuotedCommandString(std::string &command) const { std::vector<llvm::StringRef> args_ref; std::vector<std::string> owner; for (size_t i = 0; i < m_entries.size(); ++i) { if (m_entries[i].quote) { std::string arg; arg += m_entries[i].quote; arg += m_entries[i].ref; arg += m_entries[i].quote; owner.push_back(arg); args_ref.push_back(owner.back()); } else { args_ref.push_back(m_entries[i].ref); } } command = llvm::sys::flattenWindowsCommandLine(args_ref); return !m_entries.empty(); }
Yes, that's more-or-less what I had in mind. I have some comments about the implementation, but none of them are fundamental, I think:
- add "windows" to the function name somewhere (GetFlattenedWindowsCommandString?, GetWindowsCommandString?, ...), as function uses quoting style which is specifically meant for windows, and is unlikely to be correct elsewhere
- drop the if (m_entries[i].quote) branch. You don't need it here, and I don't believe it will be correct anyway, because all it will do is cause llvm::sys::flattenWindowsCommandLine to add one more quote level around that and escape the quotes added by yourself
- drop the if (m_entries[i].quote) branch. You don't need it here, and I don't believe it will be correct anyway, because all it will do is cause llvm::sys::flattenWindowsCommandLine to add one more quote level around that and escape the quotes added by yourself
The quote char might be specified through Args::AppendArgument at user's will.
Yes, but do you know what the user meant when he set it? I don't, but I'm pretty sure he did not mean it to be passed verbatim to the launched process. It certainly isn't what will happen on posix systems as there we just ignore it. Maybe nobody knows, which is why nobody sets it? I think it would be best to be consistent with posix systems, and ignore it here too (at least until we have a good reason to do otherwise).
I've always disliked this argument and hoped that someday someone would remove it entirely. My recollection (which may be wrong) is that the only actual use of it is so that if someone types a command, and we later need to print the command back, we will print it with the same quote char. It almost seems like we could just delete the argument and use a standardized quote char when flattening a command string.
+100
BTW, today I've tried to switch ProcessLauncherWindows to flattenWindowsCommandLine and this change alone was enough to fix TestQuoting, which has some tests XFAILed for windows due to quoting problems. I haven't sent out a patch yet because that has also broken platform shell dir c:\ for some reason, and I haven't gotten around to investigating that. If you (for any value of you) have some time, I'd encourage you to look into that. Otherwise, I'll get to that eventually.
Yes. I myself added a patch to test quoting also launch gdbserver by lldb-server.exe. Ran well so far. Also lldb/unittests/Utility tests.
Haven't tried other regression tests yet.
+#if defined(_WIN32)
+TEST(ArgsTest, GetFlattenWindowsCommandString) {
+ Args args;
+ args.AppendArgument("D:\\launcher.exe");
+ args.AppendArgument("--log=abc def");
+
+ std::string stdstr;
+ ASSERT_TRUE(args.GetFlattenWindowsCommandString(stdstr));
+ EXPECT_EQ(stdstr, "\"D:\\launcher.exe\" \"--log=abc def\"");
+}
+#endif
BTW, if this is going to end up #ifdefed (I know you need to do that because flattenWindowsCommandLine is #ifdef _WIN32), then I believe it should go somewhere under Host/windows. It's not used outside of windows code anyway...
We could put it in Args.h, but then we'd need to un-ifdef the llvm function (which I believe we could do, but we'd need to get that ok'd by the llvm folks).
It could be the llvm::sys::flattenWindowsCommandLine issue to flatten the command “dir c:\" for Windows Command Terminal.
However it is not an issue for PS or MingGW.
It is observed the flattened one is
"\"C:\\WINDOWS\\system32\\cmd.exe\" /C \" dir c:\\\\\" "
which will be interpreted as the following that is not accepted by CMD.exe.(dir c:\ or dir c:\.\ is fine. There is no '\' directory or file on my Drive c).
However it is accepted by PS and MingGW.
"C:\\WINDOWS\\system32\\cmd.exe" /C " dir c:\\"
include/lldb/Utility/Args.h | ||
---|---|---|
121 ↗ | (On Diff #182704) | I am sorry for being such a pain, but I don't think this is grammatically correct, as get and flatten are both verbs. I'd call this either GetFlattenedWindowsCommandLine or just plain FlattenWindowsCommandLine. |
source/Utility/Args.cpp | ||
252 ↗ | (On Diff #182704) | This will not compile on non-windows systems as flattenWindowsCommandLine is #ifdef _WIN32. We can either try removing the ifdefs from the llvm function, or put this code somewhere under Host/windows (it could just be a static function in ProcessLauncherWindows). |
include/lldb/Utility/Args.h | ||
---|---|---|
121 ↗ | (On Diff #182704) | There are two kinds of sources of args in this discussion:
We expect to call a GetFlattenedWindowsCommandString for raw input. However it seems not the case for now. What about adding a TODO in the following in ProcessWindowsLauncher. Would like a solution to proceed. + |
include/lldb/Utility/Args.h | ||
---|---|---|
121 ↗ | (On Diff #182704) | I am afraid you lost me here. By the time something gets to this function, it has already been parsed into individual argv strings. I am not sure which ones do you consider raw, or why should it make a difference. (lldb builtin interpreter has a concept of "raw" commands, but I don't think that's what you mean here) This function should take those argv strings, no matter what they are, and recompose them into a single string. If those strings are wrong, then it's output will also be wrong, but that's not a problem of this function -- it's a problem of whoever parsed those strings. I won't have access to a windows machine this week to check this out, but I can take a look at that next week. In the mean time, I would be fine with just xfailing the single test which fails because of this. I think that's a good tradeoff, as this would fix a bunch of other tests as well as unblock you on your lldb-server work. |
include/lldb/Utility/Args.h | ||
---|---|---|
121 ↗ | (On Diff #182704) | I can't test this out, but the place I'd expect the problem to be is in ProcessLaunchInfo::ConvertArgumentsForLaunchingInShell. This function seems to be blissfully unaware of the windows quoting rules, and yet it's the one that turns platform shell dir c:\ into cmd, /c whatever. The flatten function then gets this argv array and flattens it one more time. I am not sure if that's the right thing to do on windows. |
include/lldb/Utility/Args.h | ||
---|---|---|
121 ↗ | (On Diff #182704) | It is observed when you type "dir c:\" on lldb console, the IOHandlerEditline::GetLine will yield "dir c:\\" for you through the standard input (I skipped the new line char '\n'). And the CommandInterpreter::ResolveCommandImpl won't get the raw command line string, i mean "dir c:\", even I force WantsRawCommandString() true to get one. |
include/lldb/Utility/Args.h | ||
---|---|---|
121 ↗ | (On Diff #182704) | How are you observing that? Are you sure that's what happens, and it's not just the lldb formatting of c strings (when lldb displays a const char * variable, it will automatically escape any backslashes within, which means the backslashes will appear doubled). I'd be surprised if extra backslashes appear at this level (they certainly don't on linux), as plenty of other things would fail if this didn't work. |
include/lldb/Utility/Args.h | ||
---|---|---|
121 ↗ | (On Diff #182704) | I set the breakpoint through MSVC, triggered it by typing the command "dir c:\" on lldb console and the value I read was "dir c:\\\n". |
include/lldb/Utility/Args.h | ||
---|---|---|
121 ↗ | (On Diff #182704) | Isn't MSVC doing the same thing here? Otherwise, how would you be able to tell a line feed character ("\n") from a literal backslash followed by the letter N? |
include/lldb/Utility/Args.h | ||
---|---|---|
121 ↗ | (On Diff #182704) | My understanding of the IOHandlerEditline::GetLine function is it calls fgets to get the input string which is stored as "dir c:\\\n" and the tailing char ('\n') is stripped out intentionally so a "dir c:\\" is the leftover. |
Hello labath, I am thinking maybe it is just because the windows command 'dir' can't interpret "\\" in the root directory path. For example,
This is working
"C:\\WINDOWS\\system32\\cmd.exe" /C " dir c:\Users\\abc"
This is not working
"C:\\WINDOWS\\system32\\cmd.exe" /C " dir c:\\Users\\abc"
Other commands, like ls seems not have this problem.
So could we just go ahead with llvm::sys::flattenWindowsCommandLine but set a red light on 'dir' command only, at least handle it specifically?
Sorry about the delay.
I've done some experiments of my own, and my conclusion is that the "special" part here is not the "dir" command but cmd.exe itself. It handles quotes differently than typical windows applications. It behavior is documented like this:
1. If all of the following conditions are met, then quote characters on the command line are preserved: - no /S switch - exactly two quote characters - no special characters between the two quote characters, where special is one of: &<>()@^| - there are one or more whitespace characters between the the two quote characters - the string between the two quote characters is the name of an executable file. 2. Otherwise, old behavior is to see if the first character is a quote character and if so, strip the leading character and remove the last quote character on the command line, preserving any text after the last quote character.
So given these rules, and the lldb command platform shell dir c:\, the correct string to pass to CreateProcess would be cmd.exe /c "dir c:\". Conceptually, that is easy to achieve, but it practice that's a bit hard because the road from "platform shell" to CreateProcess is long and winding:
- first, the string dir c:\ gets converted to a Args array in Host::RunShellCommand (the const char * version). This step uses the "lldb" quoting rules and results in something like dir, c:\.
- then the Args array gets transmogrified in ProcessLaunchInfo::ConvertArgumentsForLaunchingInShell. This uses some approximation of the posix shell quoting rules, but I believe that is not fully correct for posix shells either. The result is cmd.exe, /c, dir c:\ (IIRC).
- finally, we call flattenWindowsCommandLine. This uses the standard windows quoting rules, which gives us the string `cmd.exe /c "dir c:\\"
So, I think the proper solution here would be to find a way to skip steps 1. and 2. They're completely unnecessary even in the posix case -- the user types a string, the shell expects a string, we should be able to just pass that string from one place to the other and not play around with parsing and reconstituting it. Then, it will be easy to construct the proper command line for windows, or the right argv array in the posix case. This will probably involve having ProcessLaunchInfo be backed by either an argv array or a plain string.
However, this is getting pretty far from the original intention of your patch. For the time being I would be happy with checking in the present version of your patch, with the knowledge that this breaks some "platform shell" use cases. I don't think that's too bad, because "platform shell" is not the most important feature in lldb, and it's not fully correct now anyway. In return for that, we would get all tests in TestQuoting.py passing.
Thanks for the feedback and interesting discussion! If no more comments, let's go with this version for now.
OK, sounds good to me then. Just please make sure the update the test decorators before committing this (I expect you'll need to remove @xfailwindows from TestQuoting and add it to the dir c:\ test (I forgot the name).
This change is essential others seem not that necessary but nice to have in order to support multiple systems.