Page MenuHomePhabricator

[gdb-remote] Use lldb's portable Host::GetEnvironment() instead of getenv
Needs ReviewPublic

Authored by asmith on Wed, Jan 2, 4:10 PM.

Details

Diff Detail

Event Timeline

asmith created this revision.Wed, Jan 2, 4:10 PM
labath added a subscriber: labath.

Forgive my ignorance, but what makes getenv unportable? llvm uses in a bunch of places so it can't be that bad... Is it some wide string thingy?

Regardless, using the lldb function for that seems fine, but if I remember correctly, each GetEnvironment call causes us to reparse the entire environment block. So it would be nice to just call it once in this function and then query that object for what we need.

Hui added a subscriber: Hui.Fri, Jan 4, 2:01 PM
Hui added inline comments.
source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp
1145

This change is essential others seem not that necessary but nice to have in order to support multiple systems.

zturner added inline comments.Fri, Jan 4, 2:37 PM
source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp
1145

I would prefer to avoid using C-style formatting if possible. LLVM has some nice formatting routines that are safer and easier to understand.

While we're here, perhaps we could change char arg_cstr[PATH_MAX]; up at the top of this function to std::string arg_str;

Regardless, we can re-write this in a number of ways, such as:

arg_str = llvm::formatv("--log-channels = \"{0}\", env_debugserver_log_channels).str();
arg_str = (Twine("--log-channels = \"") + env_debugserver_log_channels + "\"").str();
arg_str = llvm::format("--log-channels = \"%s\"", env_debugserver_log_channels);

to get rid of this snprintf call.

labath requested changes to this revision.Mon, Jan 7, 2:34 AM
labath added inline comments.
source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp
1145

Actually, this shouldn't be done here at all. The assumption here is that the contents of the Args vector will get passed verbatim to the argc+argv of the called program. On unix systems we will do just that, so if you add the quotes here they will be passed the lldb-server, and then the parse there will fail because we will get extra quotes we didn't expect.

Since Windows needs special logic in order to ensure this, that logic should be inside windows-specific code, probably ProcessLauncherWindows. That will make all other instances of launching processes with spaces work, not just this particular case. It seems ProcessLauncherWindows uses Args::GetQuotedCommandString to do this job, but this function seems extremely under-equipped for this job. I'd suggest implementing a windows-specific version of this in ProcessLauncherWindows, which will do whatever fixups are necessary to make this happen.

This revision now requires changes to proceed.Mon, Jan 7, 2:34 AM
Hui added inline comments.Tue, Jan 8, 2:58 PM
source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp
1145

It is not that applicable for the windows process launcher to determine which entry in the args needs to be quoted unless given very specific flag or option. I think this issue shall be documented and ask user to be responsible for setting the "right" variables. For example,

on Windows PS,
$env:LLDB_SERVER_LOG_CHANNELS="""lldb all:windows all"""

instead of
$env:LLDB_SERVER_LOG_CHANNELS="lldb all:windows all"

on MSVC,
LLDB_DEBUGSERVER_LOG_FILE="d:\\a b c\\log.txt"

instead of
LLDB_DEBUGSERVER_LOG_FILE=d:\\a b c\\log.txt

labath added a comment.Wed, Jan 9, 5:29 AM

It is not that applicable for the windows process launcher to determine which entry in the args needs to be quoted unless given very specific flag or option.

Why not? Given the argv parsing rules described here https://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments?view=vs-2017, it sounds like it should be possible to create an algorithm doing the reverse mapping.
Something like this ought to do the trick:

for(string: Args) {
  if (string.contains_either_of(" \t\"")  {
    double_the_amount_of_backslashes_in_front_of_every_quote_char(string);
    string = '"' + string '"';
  }
  cmdline += " " + string;
}
Hui added a comment.Fri, Jan 11, 2:01 PM

I think the key problem here is to make sure the argument will be treated as a single argument to the process launcher.

To be specific to this case only, could we just provide a quote char to argument log file path and log channels on Windows?

The downside is one more #if is introduced.

#ifdef _WIN32
char quote_char= '"';
#else
char quote_char='\0';
#endif

 std::string env_debugserver_log_channels =
     host_env.lookup("LLDB_SERVER_LOG_CHANNELS");
 if (!env_debugserver_log_channels.empty()) {
   debugserver_args.AppendArgument(
       llvm::formatv("--log-channels={0}", env_debugserver_log_channels)
           .str(), quote_char);
}

It is not that applicable for the windows process launcher to determine which entry in the args needs to be quoted unless given very specific flag or option.

Why not? Given the argv parsing rules described here https://docs.microsoft.com/en-us/cpp/cpp/parsing-cpp-command-line-arguments?view=vs-2017, it sounds like it should be possible to create an algorithm doing the reverse mapping.
Something like this ought to do the trick:

for(string: Args) {
  if (string.contains_either_of(" \t\"")  {
    double_the_amount_of_backslashes_in_front_of_every_quote_char(string);
    string = '"' + string '"';
  }
  cmdline += " " + string;
}
In D56230#1354829, @Hui wrote:

I think the key problem here is to make sure the argument will be treated as a single argument to the process launcher.

To be specific to this case only, could we just provide a quote char to argument log file path and log channels on Windows?

The downside is one more #if is introduced.

#ifdef _WIN32
char quote_char= '"';
#else
char quote_char='\0';
#endif

 std::string env_debugserver_log_channels =
     host_env.lookup("LLDB_SERVER_LOG_CHANNELS");
 if (!env_debugserver_log_channels.empty()) {
   debugserver_args.AppendArgument(
       llvm::formatv("--log-channels={0}", env_debugserver_log_channels)
           .str(), quote_char);
}

I almost feel like the Args class could be made smarter here. Because all of the proposed solutions will still not work correctly on Linux. For example, why is this Windows specific? Args::GetCommandString() is very dumb and just loops over each one appending to a string. So if your log file name contains spaces on Linux, you would still have a problem no?

Currently the signature of AppendArgument() takes an argument string, and a quote char, which defaults to '\0' which means no quote. But there is only one call-site of that entire function out of many hundreds of other calls to the function, most of which just pass nothing. This means that all of those other call-sites are broken if their argument contains a space (many are string literals though, so it won't be a problem there).

But why don't we just split this into two functions:

Args::QuoteAndAppendArgument(const Twine &arg);
Args::AppendArgument(const Twine &arg);

and the first one will auto-detect the quoting style to use based on the host operating system. This way we would fix the problem for all platforms, not just Windows, and we could also use the function to fix potentially many other bugs at all the callsites where we append unquoted arguments.

asmith updated this revision to Diff 181377.Fri, Jan 11, 3:21 PM
asmith marked 4 inline comments as done.Fri, Jan 11, 3:45 PM
In D56230#1354829, @Hui wrote:

I think the key problem here is to make sure the argument will be treated as a single argument to the process launcher.

Can you elaborate on that? I still don't see what is the problem with the solution I proposed.

To be specific to this case only, could we just provide a quote char to argument log file path and log channels on Windows?

The downside is one more #if is introduced.

Yes, but this is not a problem specific to this code. Plenty of code appends strings to the Args class which may contain spaces all of those would be wrong if they make it's way to the windows process launcher, so it's logical to fix the problem there, as it all affected code paths have to go through there and it alone has enough information to make the right decision on how to quote things.

I almost feel like the Args class could be made smarter here. Because all of the proposed solutions will still not work correctly on Linux. For example, why is this Windows specific? Args::GetCommandString() is very dumb and just loops over each one appending to a string. So if your log file name contains spaces on Linux, you would still have a problem no?

There won't be any problem with space (or any other character for that matter) on linux, because we don't use GetCommandString on linux. We don't use it for two reasons:
a) we don't need to -- we just pass the individual strings in the Args array as an argv vector into execve
b) GetCommandString is fundamentally broken. It uses random quoting rules which aren't even deterministically reversible. It will probably be right in most situations when the string contains spaces, but things will blow up as soon as the string starts to contain quote chars and backslashes.

Currently the signature of AppendArgument() takes an argument string, and a quote char, which defaults to '\0' which means no quote. But there is only one call-site of that entire function out of many hundreds of other calls to the function, most of which just pass nothing. This means that all of those other call-sites are broken if their argument contains a space (many are string literals though, so it won't be a problem there).

But why don't we just split this into two functions:

Args::QuoteAndAppendArgument(const Twine &arg);
Args::AppendArgument(const Twine &arg);

and the first one will auto-detect the quoting style to use based on the host operating system. This way we would fix the problem for all platforms, not just Windows, and we could also use the function to fix potentially many other bugs at all the callsites where we append unquoted arguments.

I fundamentally disagree with this approach. There's a reason why most call sites do not use the quote_char, and that's because it's impossible to use correctly. The quoting character, and quoting rules are in general not known at the place where we build the Args vector. Specifically, they are not a property of the host system. They're a property of the entity you are about to pass the quoted string to. So you need to quote the string differently depending on whether you are going to pass it to:

  • a posix-style syscall like execve
  • windows CreateProcess
  • the lldb builtin interpreter
  • windows cmd.exe
  • a posix shell (there might be even variations between the shells here)

There's no way all of this can be encompassed in a single char variable. Using the "host" rules would be the right solution here, but it would be wrong pretty much everywhere else, because you never know if those Args are going to end up being transmitted over the network and executed on a different system with different rules.

That's why I think the only reasonable solution here is to ignore the quote char here like everyone else is doing already (we should eventually remove it). Instead we should have separate functions like QuoteForCreateProcess, QuoteForLLDBInterpreter, ... (I don't care whether they live in the Args class or not), and have each consumer responsible for picking the right one. So, ProcessLauncherWindows would call QuoteForCreateProcess, which would apply any transformations necessary, pass the result to CreateProcess and be done.

For example, for a Args vector like lldb-server, gdb-remote, --log-channels=foo\\\ \\\""" ''', whatever, QuoteForCreateProcess would return
lldb-server gdb-remote "--log-channels=foo\\\ \\\\\\\"\"\" '''" whatever (there are other ways to quote this too). Passing this string to CreateProcess will result in the original vector being available to the main function of lldb-server, which is what this code (and all other code that works with the Args class) expects.

For example, for a Args vector like lldb-server, gdb-remote, --log-channels=foo\\\ \\\""" ''', whatever, QuoteForCreateProcess would return
lldb-server gdb-remote "--log-channels=foo\\\ \\\\\\\"\"\" '''" whatever (there are other ways to quote this too). Passing this string to CreateProcess will result in the original vector being available to the main function of lldb-server, which is what this code (and all other code that works with the Args class) expects.

Btw, there is already code for doing this in llvm (llvm::sys::flattenWindowsCommandLine), so we can just steal the implementation from there.

Hui added a comment.EditedTue, Jan 15, 10:15 AM

For example, for a Args vector like lldb-server, gdb-remote, --log-channels=foo\\\ \\\""" ''', whatever, QuoteForCreateProcess would return
lldb-server gdb-remote "--log-channels=foo\\\ \\\\\\\"\"\" '''" whatever (there are other ways to quote this too). Passing this string to CreateProcess will result in the original vector being available to the main function of lldb-server, which is what this code (and all other code that works with the Args class) expects.

Btw, there is already code for doing this in llvm (llvm::sys::flattenWindowsCommandLine), so we can just steal the implementation from there.

What do you think of the following codes to be added in Args?

bool Args::GetFlattenQuotedCommandString(std::string &command) const {
  std::vector<llvm::StringRef> args_ref;
  std::vector<std::string> owner;

  for (size_t i = 0; i < m_entries.size(); ++i) {
    if (m_entries[i].quote) {
      std::string arg;
      arg += m_entries[i].quote;
      arg += m_entries[i].ref;
      arg += m_entries[i].quote;
      owner.push_back(arg);
      args_ref.push_back(owner.back());
    } else {
      args_ref.push_back(m_entries[i].ref);
    }
  }

  command = llvm::sys::flattenWindowsCommandLine(args_ref);

  return !m_entries.empty();
}
In D56230#1358102, @Hui wrote:

What do you think of the following codes to be added in Args?

bool Args::GetFlattenQuotedCommandString(std::string &command) const {
  std::vector<llvm::StringRef> args_ref;
  std::vector<std::string> owner;

  for (size_t i = 0; i < m_entries.size(); ++i) {
    if (m_entries[i].quote) {
      std::string arg;
      arg += m_entries[i].quote;
      arg += m_entries[i].ref;
      arg += m_entries[i].quote;
      owner.push_back(arg);
      args_ref.push_back(owner.back());
    } else {
      args_ref.push_back(m_entries[i].ref);
    }
  }

  command = llvm::sys::flattenWindowsCommandLine(args_ref);

  return !m_entries.empty();
}

Yes, that's more-or-less what I had in mind. I have some comments about the implementation, but none of them are fundamental, I think:

  • add "windows" to the function name somewhere (GetFlattenedWindowsCommandString?, GetWindowsCommandString?, ...), as function uses quoting style which is specifically meant for windows, and is unlikely to be correct elsewhere
  • drop the if (m_entries[i].quote) branch. You don't need it here, and I don't believe it will be correct anyway, because all it will do is cause llvm::sys::flattenWindowsCommandLine to add one more quote level around that and escape the quotes added by yourself
Hui added a comment.Tue, Jan 15, 11:18 AM
  • drop the if (m_entries[i].quote) branch. You don't need it here, and I don't believe it will be correct anyway, because all it will do is cause llvm::sys::flattenWindowsCommandLine to add one more quote level around that and escape the quotes added by yourself

The quote char might be specified through Args::AppendArgument at user's will.

In D56230#1358269, @Hui wrote:
  • drop the if (m_entries[i].quote) branch. You don't need it here, and I don't believe it will be correct anyway, because all it will do is cause llvm::sys::flattenWindowsCommandLine to add one more quote level around that and escape the quotes added by yourself

The quote char might be specified through Args::AppendArgument at user's will.

Yes, but do you know what the user meant when he set it? I don't, but I'm pretty sure he did not mean it to be passed verbatim to the launched process. It certainly isn't what will happen on posix systems as there we just ignore it. Maybe nobody knows, which is why nobody sets it? I think it would be best to be consistent with posix systems, and ignore it here too (at least until we have a good reason to do otherwise).

In D56230#1358269, @Hui wrote:
  • drop the if (m_entries[i].quote) branch. You don't need it here, and I don't believe it will be correct anyway, because all it will do is cause llvm::sys::flattenWindowsCommandLine to add one more quote level around that and escape the quotes added by yourself

The quote char might be specified through Args::AppendArgument at user's will.

Yes, but do you know what the user meant when he set it? I don't, but I'm pretty sure he did not mean it to be passed verbatim to the launched process. It certainly isn't what will happen on posix systems as there we just ignore it. Maybe nobody knows, which is why nobody sets it? I think it would be best to be consistent with posix systems, and ignore it here too (at least until we have a good reason to do otherwise).

I've always disliked this argument and hoped that someday someone would remove it entirely. My recollection (which may be wrong) is that the only actual use of it is so that if someone types a command, and we later need to print the command back, we will print it with the same quote char. It almost seems like we could just delete the argument and use a standardized quote char when flattening a command string.

I've always disliked this argument and hoped that someday someone would remove it entirely. My recollection (which may be wrong) is that the only actual use of it is so that if someone types a command, and we later need to print the command back, we will print it with the same quote char. It almost seems like we could just delete the argument and use a standardized quote char when flattening a command string.

+100

BTW, today I've tried to switch ProcessLauncherWindows to flattenWindowsCommandLine and this change alone was enough to fix TestQuoting, which has some tests XFAILed for windows due to quoting problems. I haven't sent out a patch yet because that has also broken platform shell dir c:\ for some reason, and I haven't gotten around to investigating that. If you (for any value of you) have some time, I'd encourage you to look into that. Otherwise, I'll get to that eventually.

Hui added a comment.Thu, Jan 17, 8:07 AM

I've always disliked this argument and hoped that someday someone would remove it entirely. My recollection (which may be wrong) is that the only actual use of it is so that if someone types a command, and we later need to print the command back, we will print it with the same quote char. It almost seems like we could just delete the argument and use a standardized quote char when flattening a command string.

+100

BTW, today I've tried to switch ProcessLauncherWindows to flattenWindowsCommandLine and this change alone was enough to fix TestQuoting, which has some tests XFAILed for windows due to quoting problems. I haven't sent out a patch yet because that has also broken platform shell dir c:\ for some reason, and I haven't gotten around to investigating that. If you (for any value of you) have some time, I'd encourage you to look into that. Otherwise, I'll get to that eventually.

Yes. I myself added a patch to test quoting also launch gdbserver by lldb-server.exe. Ran well so far. Also lldb/unittests/Utility tests.

Haven't tried other regression tests yet.

+#if defined(_WIN32)
+TEST(ArgsTest, GetFlattenWindowsCommandString) {
+ Args args;
+ args.AppendArgument("D:\\launcher.exe");
+ args.AppendArgument("--log=abc def");
+
+ std::string stdstr;
+ ASSERT_TRUE(args.GetFlattenWindowsCommandString(stdstr));
+ EXPECT_EQ(stdstr, "\"D:\\launcher.exe\" \"--log=abc def\"");
+}
+#endif

In D56230#1361650, @Hui wrote:

+#if defined(_WIN32)
+TEST(ArgsTest, GetFlattenWindowsCommandString) {
+ Args args;
+ args.AppendArgument("D:\\launcher.exe");
+ args.AppendArgument("--log=abc def");
+
+ std::string stdstr;
+ ASSERT_TRUE(args.GetFlattenWindowsCommandString(stdstr));
+ EXPECT_EQ(stdstr, "\"D:\\launcher.exe\" \"--log=abc def\"");
+}
+#endif

BTW, if this is going to end up #ifdefed (I know you need to do that because flattenWindowsCommandLine is #ifdef _WIN32), then I believe it should go somewhere under Host/windows. It's not used outside of windows code anyway...

We could put it in Args.h, but then we'd need to un-ifdef the llvm function (which I believe we could do, but we'd need to get that ok'd by the llvm folks).

Hui added a comment.EditedThu, Jan 17, 9:04 AM

It could be the llvm::sys::flattenWindowsCommandLine issue to flatten the command “dir c:\" for Windows Command Terminal.
However it is not an issue for PS or MingGW.

It is observed the flattened one is

"\"C:\\WINDOWS\\system32\\cmd.exe\" /C \" dir c:\\\\\" "

which will be interpreted as the following that is not accepted by CMD.exe.(dir c:\ or dir c:\.\ is fine. There is no '\' directory or file on my Drive c).
However it is accepted by PS and MingGW.

"C:\\WINDOWS\\system32\\cmd.exe" /C " dir c:\\"