This is an archive of the discontinued LLVM Phabricator instance.

[Driver] Don't escape backslashes on Windows
AbandonedPublic

Authored by smeenai on Jul 10 2019, 3:21 PM.

Details

Summary

Escaping backslashes results in unnatural looking output, and the actual
escapes are mostly unnecessary. We were also not consistent about when
we escaped backslashes and when we didn't, making it impossible to e.g.
store the InstalledDir output to a FileCheck variable and then use that
variable to check paths relative to the driver.

This change could be generalized to any platform which uses the
backslash as a path separator instead of just Windows. I'm unaware of
other such platforms, however, and I'm also unsure if not escaping the
backslash would be appropriate for those platforms, so I'm limiting to
Windows for now.

Event Timeline

smeenai created this revision.Jul 10 2019, 3:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2019, 3:21 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Hmm, this only fixes -###. Looking into -v now.

Hmm, this only fixes -###. Looking into -v now.

Ignore this – I was testing with a version without my change :D

smeenai added a subscriber: phosek.Jul 10 2019, 3:32 PM

CC @phosek, since I believe you had issues writing tests referencing the installation directory in the past.

rnk added a comment.Jul 10 2019, 3:35 PM

An extremely convenient feature of the current escaping pattern is that it magically works for both the cmd shell and various bash implementations. You can simply copy paste the commands and run them. This matters, for example, for the .sh crash reproducer script. On Windows today, you can just run it with bash, and it works. So, I'd prefer it if we didn't do this.

If you want -### output to be more FileCheck friendly, then I would recommend adding a new driver flag for testing that dumps the commands without doing any escaping. This came up before:
https://reviews.llvm.org/D62493

In D64538#1579561, @rnk wrote:

An extremely convenient feature of the current escaping pattern is that it magically works for both the cmd shell and various bash implementations. You can simply copy paste the commands and run them. This matters, for example, for the .sh crash reproducer script. On Windows today, you can just run it with bash, and it works. So, I'd prefer it if we didn't do this.

If you want -### output to be more FileCheck friendly, then I would recommend adding a new driver flag for testing that dumps the commands without doing any escaping. This came up before:
https://reviews.llvm.org/D62493

As long as the backslashes aren't followed by a few special characters (as detailed in my comment), the backslash should still continue to work. I've been testing this with git bash and I'm still able to copy-paste the ouptut command.

I'm okay adding a new driver flag which just skips all escaping, but I'd also prefer for stuff running on Windows to follow Windows path conventions.

rnk added a comment.Jul 10 2019, 3:58 PM
In D64538#1579561, @rnk wrote:

An extremely convenient feature of the current escaping pattern is that it magically works for both the cmd shell and various bash implementations. You can simply copy paste the commands and run them. This matters, for example, for the .sh crash reproducer script. On Windows today, you can just run it with bash, and it works. So, I'd prefer it if we didn't do this.

If you want -### output to be more FileCheck friendly, then I would recommend adding a new driver flag for testing that dumps the commands without doing any escaping. This came up before:
https://reviews.llvm.org/D62493

As long as the backslashes aren't followed by a few special characters (as detailed in my comment), the backslash should still continue to work. I've been testing this with git bash and I'm still able to copy-paste the ouptut command.

For the use case of crash reproduction, we often get these .sh files from users, and then try to reproduce and reduce the crash on Linux. Right now it's just a matter of adjusting the clang binary being run to reproduce a crash, but with this change, it will be harder. The current escaping is kind of the universal currency, a quoting style that is accepted everywhere.

I'm okay adding a new driver flag which just skips all escaping, but I'd also prefer for stuff running on Windows to follow Windows path conventions.

In D64538#1579610, @rnk wrote:
In D64538#1579561, @rnk wrote:

An extremely convenient feature of the current escaping pattern is that it magically works for both the cmd shell and various bash implementations. You can simply copy paste the commands and run them. This matters, for example, for the .sh crash reproducer script. On Windows today, you can just run it with bash, and it works. So, I'd prefer it if we didn't do this.

If you want -### output to be more FileCheck friendly, then I would recommend adding a new driver flag for testing that dumps the commands without doing any escaping. This came up before:
https://reviews.llvm.org/D62493

As long as the backslashes aren't followed by a few special characters (as detailed in my comment), the backslash should still continue to work. I've been testing this with git bash and I'm still able to copy-paste the ouptut command.

For the use case of crash reproduction, we often get these .sh files from users, and then try to reproduce and reduce the crash on Linux. Right now it's just a matter of adjusting the clang binary being run to reproduce a crash, but with this change, it will be harder. The current escaping is kind of the universal currency, a quoting style that is accepted everywhere.

I'm not fully understanding this. If the .sh file was generated on Windows, it'll contain backslashes in any arguments that are paths, right? Before this change, those backslashes would have been doubled up, but the backslash still wouldn't work as a path separator on Linux, so those arguments would need manual adjustment. I guess the other case that matters is if you have a backslash in an argument which isn't a path, in which case the old behavior was definitely correct and the new one definitely isn't.

rnk added a comment.Jul 10 2019, 7:29 PM

I'm not fully understanding this. If the .sh file was generated on Windows, it'll contain backslashes in any arguments that are paths, right? Before this change, those backslashes would have been doubled up, but the backslash still wouldn't work as a path separator on Linux, so those arguments would need manual adjustment. I guess the other case that matters is if you have a backslash in an argument which isn't a path, in which case the old behavior was definitely correct and the new one definitely isn't.

At least in Chromium, users pass flags like -DFOO_EXPORT="__declspec(dllexport)". I can't remember a concrete instance where a user passed non-path flags with backslashes in them, but you could imagine a hypothetical argument like: -DSOME_STRING="foo \"bar baz\"", and if we don't escape those backslashes, the argument will be tokenized incorrectly. I guess that example is an objection to this part of the comment:

// double quotes only when it's followed by $, `, ", \, or a literal newline;
// the last three are disallowed in paths on Windows and the first two are
// unlikely, so this shouldn't cause issues in practice. Note that we always

I can also imagine tokenization going wrong if some inconsequential path ends in a trailing backslash: -fdebug-compilation-dir=C:\foo\bar\ -> "-fdebug-compilation-dir=C:\foo\bar\"

I like what we do now because it's safe and handles the general case of backslashes in flags without any gotchas or corner cases. If the specific issue that we have is that driver tests are hard to write because of the variance in path printing, driver testing could already be greatly improved by adding a filecheck-friendly version of -###, so we should do that anyway.

smeenai abandoned this revision.Aug 6 2019, 6:02 PM

Abandoning in favor of D65839