Linker command-line arguments containing spaces were being stored without quoting the arguments. This resulted in ambiguous argument strings and prevented reliable parsing. MSVC simply wraps arguments containing spaces in quotes, so we do the same.
Diff Detail
Event Timeline
COFF/PDB.cpp | ||
---|---|---|
1068 | Can you factor this code as std::string quote(ArayRef<StringRef>)? I believe you also need to quote a double-quote as double double-quotes as well (but please check what the MS linker behavior is first.) |
COFF/PDB.cpp | ||
---|---|---|
1075–1076 | In addition to what rui suggested, can you only do this if the string is not already quoted? Furthermore, you may need to escape embedded quotes (is that something we need to worry about?) |
COFF/PDB.cpp | ||
---|---|---|
1068 | I'm not quite sure what you meant by "quote a double-quote as double double-quotes", or at least I couldn't find or reproduce anything to support this way of escaping a double-quote. Any further clues? Other than that, I've factored out the code into a quote() function as requested (patch pending). | |
1075–1076 | The arguments always appear to have all double-quotes removed. This applies for both the MS linker and lld-link (by the time this code sees the arguments anyhow). I've also tried escaping embedded double-quotes but they seem to be eaten by the argument parser and never appear in the tokenized args. Not to mention that there's really no place embedded quotes could be in any valid linker command-line (to the best of my knowledge). So I can't see either case being worth worrying about. |
COFF/PDB.cpp | ||
---|---|---|
1075–1076 | Since quotation marks are not valid in file names, I believe lantictac is correct that this is not worth worrying about escaping quotation marks within an argument (for linker commands). The rules for escaping quotation marks and backslashes in the Windows command line are arcane. Worse, they differ slightly depending on whether the program relies on the CRT to parse the command into a main-style argument vector of whether it calls CommandLineToArgvW to parse it explicitly. This Raymond Chen post touches on the some of the complexity, but the juiciest bits are in the comments: |
COFF/PDB.cpp | ||
---|---|---|
1068 | I don't know if it is applicable to this field, but in Windows command line, the way to escape a single double-quote in a double-quoted string is to replace it with two double-quote characters. For example, to get hello "world", one needs to quote it as "hello ""world""". At least the way you escape a string seems ambiguous when a string contains a double-quote character. |
COFF/PDB.cpp | ||
---|---|---|
1068 | It doesn't seem applicable in this case. At least I'm finding it impossible to get a single quote to show up in the args however I specify the command-line. So as far as I can ascertain it shouldn't be a concern for this code. | |
1075–1076 | To add to that, without a decent and likely test case it's going be hard to implement the functionality required to handle embedded double-quotations. How about we implement it if and when it's ever discovered in the wild? |
Hi All, What's the consensus on the additional checks? I can try and add support for the theoretical corner cases but without being able to construct test cases I'm not sure how useful it'd be.
COFF/PDB.cpp | ||
---|---|---|
1068 | It is possible to pass a single double-quote to the command and as a result the double-quote character shows up in Config->Args. Here is an example: > lld-link "Hello ""World""" C:\src\b\bin\lld-link.exe: error: could not open Hello "World": invalid argument |
COFF/PDB.cpp | ||
---|---|---|
1068 | Thanks for clarifying. I'll see if I can get it to to repro on a Windows build with a Windows specific test case. |
It looks like MSVC does not bother to quote quotation marks correctly.
/mnt/c/src/tmp$ cat quote.yaml --- !COFF header: Machine: IMAGE_FILE_MACHINE_I386 Characteristics: [ ] sections: - Name: '.text' Characteristics: [ IMAGE_SCN_CNT_CODE, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_EXECUTE ] Alignment: 1 SectionData: 00 symbols: - Name: '"' Value: 0 SectionNumber: 1 SimpleType: IMAGE_SYM_TYPE_NULL ComplexType: IMAGE_SYM_DTYPE_NULL StorageClass: IMAGE_SYM_CLASS_EXTERNAL ... /mnt/c/src/tmp$ ../l/ra/bin/yaml2obj.exe quote.yaml > quote.obj C:\src\tmp>link /debug "/entry:""" /subsystem:console quote.obj Microsoft (R) Incremental Linker Version 14.12.25835.0 Copyright (C) Microsoft Corporation. All rights reserved. /mnt/c/src/tmp$ hd quote.pdb | grep -2 entry 000060d0 65 78 65 00 70 64 62 00 43 3a 5c 73 72 63 5c 74 |exe.pdb.C:\src\t| 000060e0 6d 70 5c 71 75 6f 74 65 2e 70 64 62 00 63 6d 64 |mp\quote.pdb.cmd| 000060f0 00 20 2f 64 65 62 75 67 20 2f 65 6e 74 72 79 3a |. /debug /entry:| 00006100 22 20 2f 73 75 62 73 79 73 74 65 6d 3a 63 6f 6e |" /subsystem:con| 00006110 73 6f 6c 65 00 00 00 00 1a 00 36 11 01 00 0c 00 |sole......6.....|
A few issues. On the DOS command-line I can shoehorn in some escaped double-quotes using /manifestuac:"level='asInvoker' uiAccess=""false""" and still generate a PDB. However:
- I can't figure out a way to get the test runner to pass the escaped double-quotes during testing. It either mangles the command-line or the double-quotes are ignored. Only directly running under cmd.exe seems to work.
- I'm still not convinced anyone would ever encounter an escaped double-quote in a link.exe command-line IRL. Although, feel free to come up with a realistic example I haven't thought of.
- I'm not sure if escaping double-quotes as "" would be the best method of escaping the quote since this appears to be specific to the the Command Prompt parsing rules (and MSVC offers no standard approach).
- As pcc points out MSVC link.exe doesn't escape these inner double-quotes anyhow.
- The link command-line stored in the PDB unlikely to matter to the vast majority of users.
In short, I can't really afford much more time on this unless anyone can resolve/answer/workaround a few of the issues above. Any thoughts?
Also: Just noticed that response files aren't being expanded inline in the stored command-line (as per MSVC link.exe). I'll open a bug for it.
@ruiu what is the consensus on this? What would be a practical use-case for escaping double-quotes?
File names can already be ruled out on Windows, as double quotes are reserved.
That could happen if one is linking a COFF target on Linux system? However in that case the 'cmd' field in the PDB will not be reusable afterwards on a Windows system, which is the whole purpose of this change. If that happens, we should rather warn if reserved Windows characters are used in the 'cmd' field?
As a counter-example, /manifestuac uses single quotes to prevent such issues: /manifestuac:"level='asInvoker' uiAccess='false'"
Second that.
Would be nice to expand response files. Did you open a bug for this/are you working on this?
The bug is here: 38085. I started looking into it and it looks like it ought to be fairly simple to implement since LLD already has to generate the expanded arguments. The only challenge is passing them back to the PDB generation code. Probably won't get to it this week however.
@lantictac The following works in the test:
RUN: lld-link %S/a.obj /entry:"1 "'"'hello'"'" 2"
Outputs as expected:
lld-link: error: <root>: undefined symbol: 1 "hello" 2
Could you please update the diff to handle this case?
Can you factor this code as std::string quote(ArayRef<StringRef>)?
I believe you also need to quote a double-quote as double double-quotes as well (but please check what the MS linker behavior is first.)