This is an archive of the discontinued LLVM Phabricator instance.

[lib/Support/CommandLine][windows] - Handle "\ " as a space token.
AbandonedPublic

Authored by grimar on Sep 28 2020, 3:57 AM.

Details

Summary

This fixes an issue for LLD:
https://bugs.llvm.org/show_bug.cgi?id=47527

We might have a response file, which contains a path
to a library which included space symbols, e.g:

-Lw:/common/lib/611042000\ GIN-PCIe\ Debug

Currently it is not handled properly, because we are not escaping
space characters, like GNU does.

Diff Detail

Event Timeline

grimar created this revision.Sep 28 2020, 3:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2020, 3:57 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
grimar requested review of this revision.Sep 28 2020, 3:57 AM
rnk added a comment.Sep 28 2020, 9:47 AM

I am skeptical of this change. TokenizeWindowsCommandLine is meant to precisely implement the behavior of how the Visual C runtime tokenizes command lines, which generally also matches how MSVC tools parse response files. Any deviation from these rules generally leads to problems, and we've been pretty careful so far.

The linked bug describes attempts to use LLD as a drop-in replacement for binutils gold, which uses GNU parsing rules, so we shouldn't be using TokenizeWindowsCommandLine. Maybe you want to use TokenizeGNUCommandLine.

If we switch to a GNU style tokenisation, I'd like to know the difference in behaviour versus the Windows one, since our existing user base will be used to the Windows version. That being said, since LLD is supposed to be GNU compatible, it would make sense if it matched the GNU behaviour.

Thanks, Reid, I guess you're right. I've posted a comment on the bug page.

If we switch to a GNU style tokenisation, I'd like to know the difference in behaviour versus the Windows one, since our existing user base will be used to the Windows version. That being said, since LLD is supposed to be GNU compatible, it would make sense if it matched the GNU behaviour.

I believe that windows tokenizer implements this:
https://docs.microsoft.com/en-us/previous-versions//17w5ykft(v=vs.85)?redirectedfrom=MSDN

I don't have a text description for GNU tokenizer, but it's implementation looks simple and straightforward. See cl::TokenizeGNUCommandLine (https://github.com/llvm/llvm-project/blob/master/llvm/lib/Support/CommandLine.cpp#L827).

rnk added a comment.Sep 29 2020, 9:47 AM

That being said, since LLD is supposed to be GNU compatible, it would make sense if it matched the GNU behaviour.

I'm agreed on all fronts, but the above statement makes me twitch a bit, lld-link does not strive to be GNU compatible. :) But yes, generally, many parts of LLD aim to be GNU-compatible. We probably just need to more carefully nail down the conditions under which GNU response file parsing is desired. I believe the user always the escape hatch of --rsp-quoting=posix/windows.

Despite being marked as deprecated, I believe the documentation behind that link is accurate.

Unfortunately, Windows has (at least) two command line tokenizers.

One is GetCommandLineArgvW: https://docs.microsoft.com/en-us/windows/win32/api/shellapi/nf-shellapi-commandlinetoargvw

The other is built into the MS implementation of the C run-time library (which you get if you take the argument vector passed into main).

They behave nearly the same except in how they handle some edge cases related to backslashes and multiple levels of quotation marks. I think this Raymond Chen blog post gets into some of those issues: https://devblogs.microsoft.com/oldnewthing/20100917-00/?p=12833

In D88401#2301151, @rnk wrote:

That being said, since LLD is supposed to be GNU compatible, it would make sense if it matched the GNU behaviour.

I'm agreed on all fronts, but the above statement makes me twitch a bit, lld-link does not strive to be GNU compatible. :) But yes, generally, many parts of LLD aim to be GNU-compatible. We probably just need to more carefully nail down the conditions under which GNU response file parsing is desired. I believe the user always the escape hatch of --rsp-quoting=posix/windows.

That's fair - you can tell which version of LLD I use most often :-) Using --rsp-quoting=windows is probably a good enough escape hatch for those working with cross-platform toolchains, as long as we let people know this will be coming (i.e. we should release note this behaviour change).

grimar abandoned this revision.Sep 30 2020, 1:23 AM

Abandoning this, given the discussion.