This is an archive of the discontinued LLVM Phabricator instance.

[PDB] Quote linker arguments containing spaces (mimic MSVC)
AbandonedPublic

Authored by aganea on May 4 2018, 6:01 AM.

Details

Summary

Linker command-line arguments containing spaces were being stored without quoting the arguments. This resulted in ambiguous argument strings and prevented reliable parsing. MSVC simply wraps arguments containing spaces in quotes, so we do the same.

Diff Detail

Event Timeline

lantictac created this revision.May 4 2018, 6:01 AM
aganea added a subscriber: aganea.May 4 2018, 2:39 PM
ruiu added inline comments.May 4 2018, 3:13 PM
COFF/PDB.cpp
1084

Can you factor this code as std::string quote(ArayRef<StringRef>)?

I believe you also need to quote a double-quote as double double-quotes as well (but please check what the MS linker behavior is first.)

zturner added inline comments.May 4 2018, 3:20 PM
COFF/PDB.cpp
1091–1092

In addition to what rui suggested, can you only do this if the string is not already quoted? Furthermore, you may need to escape embedded quotes (is that something we need to worry about?)

lantictac added inline comments.May 7 2018, 3:02 PM
COFF/PDB.cpp
1084

I'm not quite sure what you meant by "quote a double-quote as double double-quotes", or at least I couldn't find or reproduce anything to support this way of escaping a double-quote. Any further clues?

Other than that, I've factored out the code into a quote() function as requested (patch pending).

1091–1092

The arguments always appear to have all double-quotes removed. This applies for both the MS linker and lld-link (by the time this code sees the arguments anyhow).

I've also tried escaping embedded double-quotes but they seem to be eaten by the argument parser and never appear in the tokenized args. Not to mention that there's really no place embedded quotes could be in any valid linker command-line (to the best of my knowledge).

So I can't see either case being worth worrying about.

lantictac updated this revision to Diff 145577.May 7 2018, 3:53 PM

Extracted implementation to a quote() function.

amccarth added inline comments.
COFF/PDB.cpp
1091–1092

Since quotation marks are not valid in file names, I believe lantictac is correct that this is not worth worrying about escaping quotation marks within an argument (for linker commands).

The rules for escaping quotation marks and backslashes in the Windows command line are arcane. Worse, they differ slightly depending on whether the program relies on the CRT to parse the command into a main-style argument vector of whether it calls CommandLineToArgvW to parse it explicitly.

This Raymond Chen post touches on the some of the complexity, but the juiciest bits are in the comments:
https://blogs.msdn.microsoft.com/oldnewthing/20100917-00/?p=12833

ruiu added inline comments.May 7 2018, 4:03 PM
COFF/PDB.cpp
1084

I don't know if it is applicable to this field, but in Windows command line, the way to escape a single double-quote in a double-quoted string is to replace it with two double-quote characters.

For example, to get hello "world", one needs to quote it as "hello ""world""".

At least the way you escape a string seems ambiguous when a string contains a double-quote character.

lantictac added inline comments.May 8 2018, 11:38 AM
COFF/PDB.cpp
1084

It doesn't seem applicable in this case. At least I'm finding it impossible to get a single quote to show up in the args however I specify the command-line.

So as far as I can ascertain it shouldn't be a concern for this code.

1091–1092

To add to that, without a decent and likely test case it's going be hard to implement the functionality required to handle embedded double-quotations. How about we implement it if and when it's ever discovered in the wild?

Hi All, What's the consensus on the additional checks? I can try and add support for the theoretical corner cases but without being able to construct test cases I'm not sure how useful it'd be.

ruiu added inline comments.May 11 2018, 10:58 AM
COFF/PDB.cpp
1084

It is possible to pass a single double-quote to the command and as a result the double-quote character shows up in Config->Args. Here is an example:

> lld-link "Hello ""World"""
C:\src\b\bin\lld-link.exe: error: could not open Hello "World": invalid argument
lantictac added inline comments.May 11 2018, 11:14 AM
COFF/PDB.cpp
1084

Thanks for clarifying. I'll see if I can get it to to repro on a Windows build with a Windows specific test case.

pcc added a subscriber: pcc.May 11 2018, 11:17 AM

It looks like MSVC does not bother to quote quotation marks correctly.

/mnt/c/src/tmp$ cat quote.yaml
--- !COFF
header:
  Machine:         IMAGE_FILE_MACHINE_I386
  Characteristics: [  ]  
sections:
  - Name:            '.text'
    Characteristics: [ IMAGE_SCN_CNT_CODE, IMAGE_SCN_MEM_READ, IMAGE_SCN_MEM_EXECUTE ]
    Alignment:       1   
    SectionData:     00  
symbols:
  - Name:            '"'
    Value:           0
    SectionNumber:   1
    SimpleType:      IMAGE_SYM_TYPE_NULL
    ComplexType:     IMAGE_SYM_DTYPE_NULL
    StorageClass:    IMAGE_SYM_CLASS_EXTERNAL
...
/mnt/c/src/tmp$ ../l/ra/bin/yaml2obj.exe quote.yaml  > quote.obj

C:\src\tmp>link /debug "/entry:""" /subsystem:console quote.obj
Microsoft (R) Incremental Linker Version 14.12.25835.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/mnt/c/src/tmp$ hd quote.pdb | grep -2 entry
000060d0  65 78 65 00 70 64 62 00  43 3a 5c 73 72 63 5c 74  |exe.pdb.C:\src\t|
000060e0  6d 70 5c 71 75 6f 74 65  2e 70 64 62 00 63 6d 64  |mp\quote.pdb.cmd|
000060f0  00 20 2f 64 65 62 75 67  20 2f 65 6e 74 72 79 3a  |. /debug /entry:|
00006100  22 20 2f 73 75 62 73 79  73 74 65 6d 3a 63 6f 6e  |" /subsystem:con|
00006110  73 6f 6c 65 00 00 00 00  1a 00 36 11 01 00 0c 00  |sole......6.....|

A few issues. On the DOS command-line I can shoehorn in some escaped double-quotes using /manifestuac:"level='asInvoker' uiAccess=""false""" and still generate a PDB. However:

  1. I can't figure out a way to get the test runner to pass the escaped double-quotes during testing. It either mangles the command-line or the double-quotes are ignored. Only directly running under cmd.exe seems to work.
  2. I'm still not convinced anyone would ever encounter an escaped double-quote in a link.exe command-line IRL. Although, feel free to come up with a realistic example I haven't thought of.
  3. I'm not sure if escaping double-quotes as "" would be the best method of escaping the quote since this appears to be specific to the the Command Prompt parsing rules (and MSVC offers no standard approach).
  4. As pcc points out MSVC link.exe doesn't escape these inner double-quotes anyhow.
  5. The link command-line stored in the PDB unlikely to matter to the vast majority of users.

In short, I can't really afford much more time on this unless anyone can resolve/answer/workaround a few of the issues above. Any thoughts?

Also: Just noticed that response files aren't being expanded inline in the stored command-line (as per MSVC link.exe). I'll open a bug for it.

aganea added a comment.Jul 5 2018, 2:24 PM

@ruiu what is the consensus on this? What would be a practical use-case for escaping double-quotes?

File names can already be ruled out on Windows, as double quotes are reserved.

That could happen if one is linking a COFF target on Linux system? However in that case the 'cmd' field in the PDB will not be reusable afterwards on a Windows system, which is the whole purpose of this change. If that happens, we should rather warn if reserved Windows characters are used in the 'cmd' field?

As a counter-example, /manifestuac uses single quotes to prevent such issues: /manifestuac:"level='asInvoker' uiAccess='false'"

Also: Just noticed that response files aren't being expanded inline in the stored command-line (as per MSVC link.exe). I'll open a bug for it.

Second that.
Would be nice to expand response files. Did you open a bug for this/are you working on this?

Also: Just noticed that response files aren't being expanded inline in the stored command-line (as per MSVC link.exe). I'll open a bug for it.

Second that.
Would be nice to expand response files. Did you open a bug for this/are you working on this?

The bug is here: 38085. I started looking into it and it looks like it ought to be fairly simple to implement since LLD already has to generate the expanded arguments. The only challenge is passing them back to the PDB generation code. Probably won't get to it this week however.

@lantictac The following works in the test:

RUN: lld-link %S/a.obj /entry:"1 "'"'hello'"'" 2"

Outputs as expected:

lld-link: error: <root>: undefined symbol: 1 "hello" 2

Could you please update the diff to handle this case?

aganea commandeered this revision.Nov 30 2018, 8:41 AM
aganea added a reviewer: lantictac.

Commited as rL348001: [PDB] Quote linker arguments containing spaces (mimic MSVC)

aganea abandoned this revision.Nov 30 2018, 8:41 AM