This is an archive of the discontinued LLVM Phabricator instance.

[llvm-ar] Fix llvm-ar response file reading on Windows
ClosedPublic

Authored by gbreynoo on Oct 31 2019, 9:03 AM.

Details

Summary

Response files where not being correctly read on Windows, this change fixes the issue and adds some tests.

Diff Detail

Event Timeline

gbreynoo created this revision.Oct 31 2019, 9:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 31 2019, 9:03 AM
gbreynoo set the repository for this revision to rL LLVM.Oct 31 2019, 9:04 AM
MaskRay added inline comments.
llvm/test/tools/llvm-ar/response-utf8.test
9

The reliance on en_US.UTF-8 is probably ok but not ideal. Do we need XFAIL: system-darwin (http://45.33.8.238/mac/1350/step_10.txt) and what's the resolution to the reverted D68472 (@thopre)?

thakis added a subscriber: thakis.Oct 31 2019, 1:35 PM
thakis added inline comments.
llvm/tools/llvm-ar/llvm-ar.cpp
1153

If we do this, we almost certainly want to add an --rsp-quoting flag like clang-cl and lld-link have, for cross builds targeting windows on linux.

thopre added inline comments.Nov 4 2019, 5:41 AM
llvm/test/tools/llvm-ar/response-utf8.test
9

I haven't had time to test manually on Mac OS X yet as I was on holidays. I'll try to do it in the coming days and if all looks good I'll revert the revert and remove the XFAIL for system-darwin.

gbreynoo marked an inline comment as done.Nov 4 2019, 5:59 AM
gbreynoo added inline comments.
llvm/tools/llvm-ar/llvm-ar.cpp
1153

I agree that flag would be useful but that it should be in another diff. I think that this change is useful by itself and could be committed first.

gbreynoo updated this revision to Diff 229054.Nov 13 2019, 3:31 AM
gbreynoo added a reviewer: thopre.

Added rsp-quoting switch as thopre requested.

ruiu added inline comments.Nov 13 2019, 9:36 PM
llvm/tools/llvm-ar/llvm-ar.cpp
269

MatchFlagWithArg function defined in ar_main does something very similar to this function, so can you factor it out and then use that function instead of writing a new one?

That doesn't only reduce the code size but also allows users use --rsp-quoting <style> in addition to --rsp-quoting=<style>.

thopre added inline comments.Nov 18 2019, 8:46 AM
llvm/test/tools/llvm-ar/response-utf8.test
9
gbreynoo updated this revision to Diff 230669.Nov 22 2019, 9:28 AM
  • Pulled out the MatchFlagWithArg lambda expression into static function, and done the same with some code that handles hyphen removal.
  • Added test cases to response.test.
  • Removed the python call in response-utf8.test. I would prefer the file comparison but it should be sufficient to show the archive is functioning properly.
MaskRay added inline comments.Nov 22 2019, 10:05 AM
llvm/test/tools/llvm-ar/response-utf8.test
9

@thopre Hi, have you tried fixing the macOS (or Windows) problem?

llvm/tools/llvm-ar/llvm-ar.cpp
1103

Prefer using in new code. It can be used in an alias template and thus improve consistency with other constructs of the language.

1106

I feel that we can get rid of some if (Arg.empty()) checks if matchFlagWithArg itself recognizes one or two dash, then removeHyphens may probably be unnecessary.

1130

Argv is passed as a non-const reference but it is not actually mutated in the function as I expected. Should getRspQuoting remove --rsp-quoting so that on L1204 we don't need if (matchFlagWithArg("rsp-quoting", Arg, ArgIt, Argv.end())... ?

gbreynoo updated this revision to Diff 231921.EditedDec 3 2019, 8:48 AM
  • Use using instead of typedef
  • removeHyphens is now called from matchFlagWithArg
  • Removed Argv reference argument from getRspQuoting
  • Fix response.test that failed on linux
gbreynoo marked an inline comment as done.Dec 3 2019, 9:25 AM
gbreynoo added inline comments.
llvm/tools/llvm-ar/llvm-ar.cpp
1130

Good catch, it was left over from a previous implementation. I think with the current implementation it may be best not to modify Argv. The call to matchFlagWithArg is cleaner than removing rsp-quoting arguments that can be in one vector element or in two elements. If there is an implementation you had in mind I'd be happy to make changes.

Context not available.

https://llvm.org/docs/GettingStarted.html#sending-patches

apt install arcanist (clone libphutil + clone arcanist)
cd llvm; arc install-certificate
arc diff 'HEAD^'  # Make sure the commit message includes `Differential Revision: `. Update the revision with a full context
llvm/test/tools/llvm-ar/response.test
25

If \ has to be used, it is recommended to indent the continuation lines to make them stand out.

llvm/tools/llvm-ar/llvm-ar.cpp
182

Why KeyLetters? I think the ar terminology uses operations and modifiers. Collectively they can be called options.

1123

Redundant pair of braces

1181

space after ,

thopre added inline comments.Dec 3 2019, 1:05 PM
llvm/test/tools/llvm-ar/response-utf8.test
9

My bad, forgot about this message. I'm not sure what problem you are referring to. The file I linked to in my previous message passes on all three of Linux, Mac OS X and Microsoft Windows.

MaskRay added inline comments.Dec 3 2019, 1:37 PM
llvm/test/tools/llvm-ar/response-utf8.test
9

Sorry, I didn't notice that llvm/test/tools/llvm-ar/mri-nonascii.test has been re-committed.

gbreynoo updated this revision to Diff 232171.Dec 4 2019, 10:15 AM

Fixed:

  • Redundant pair of braces
  • space after ,
  • indent the continuation lines
gbreynoo marked an inline comment as done.Dec 4 2019, 10:54 AM
gbreynoo added inline comments.
llvm/tools/llvm-ar/llvm-ar.cpp
182

I wanted to make the use of the variable clearer as the term Options differs in meaning:

  • In the help text the term Options is used for flags (non-single letter) that are not Operations or Modifiers.
  • In the documentation Options is for Operations , Modifiers and Other (non-single letter)
  • In the code the Options string only holds Operations and Modifiers, i.e. single letter arguments. Longer flags (not-single letter) are handled by other methods.

I hoped that changing the variable name would lessen confusion but can change it back if required.

ruiu added inline comments.Dec 16 2019, 10:38 PM
llvm/tools/llvm-ar/llvm-ar.cpp
88

clang nor lld support --rsp-quoting=default. If you pass --rsp-quoting=default to clang, it looks like it is accepted, but the truth is that clang ignores an --rsp-quoting if it is passed with an unknown argument (you can try --rsp-quoting=foo). So, for consistency, maybe you shouldn't add that one to llvm-ar?

402

Could you revert this naming change? It makes code diff a bit hard to follow.

1103

I don't think this using improves code readability.

1141

It seems that this code can be improved in terms of readability, like this:

for (StringRef Arg : Argv) {
  if (Arg == "-rsp-quoting=posix" || Arg == "--rsp-quoting=posix")
    Ret = cl::TokenizeGNUCommandLine;
  else if (Arg == "-rsp-quoting=windows" || Arg == "--rsp-quoting=windows")
    Ret = cl::TokenizeWindowsCommandLine;
}

This is I believe basically what we do in the clang driver.

gbreynoo updated this revision to Diff 235887.Jan 2 2020, 9:03 AM

Fixed:

  • Removed --rsp-quoting=default
  • Reverted rename of Options
  • Removed use of using
gbreynoo marked an inline comment as done.Jan 2 2020, 9:11 AM
gbreynoo added inline comments.
llvm/tools/llvm-ar/llvm-ar.cpp
1141

llvm-ar handles all the other long arguments with MatchFlagWithArg and the majority of this patch is so rsp-quoting can share this behaviour. Would you prefer to use the code above? It would reduce the size of the patch but would mean behaviour inconsistencies in how long arguments are handled.

UTF-8 tests on Windows used to cause llvm-ar trouble. I don't know enough of Windows (nor do I have a Windows machine...) Hope other people can review and confirm the response-utf8.test test will not cause trouble.

MaskRay added inline comments.Jan 10 2020, 10:32 AM
llvm/test/tools/llvm-ar/response.test
28

Prefer single quotes. It does less translation. Shell quoting is difficult. With single quotes, a reader can think less what happens here.

MaskRay added inline comments.Jan 10 2020, 10:34 AM
llvm/test/tools/llvm-ar/response.test
28

If you do want a form feed, you need echo -e.

You are currently using a bash/zsh extension.

gbreynoo updated this revision to Diff 238526.Jan 16 2020, 9:30 AM
  • Use single quotes in tests
  • Clarify use of echo in response.test
gbreynoo marked an inline comment as done.Jan 16 2020, 9:37 AM
gbreynoo added inline comments.
llvm/test/tools/llvm-ar/response.test
28

I wasn't looking for a form feed but have used echo -e to clarify what is output

gbreynoo updated this revision to Diff 238530.Jan 16 2020, 9:40 AM

The use of python in the test response.test was failing on my Windows machine with Python 3. This patch fixes the issue.

UTF-8 tests on Windows used to cause llvm-ar trouble. I don't know enough of Windows (nor do I have a Windows machine...) Hope other people can review and confirm the response-utf8.test test will not cause trouble.

At @gbreynoo's request, I ran this test locally, and it passed fine. It's worth noting that I sit opposite him, so it's possible that my Windows machine has some unusual but identical-to-his setup that causes it to work. I also inspected the two response files used by the utf8 test, and they have been encoded in UTF-8 (the bytes the '£' character are 0xc2a3 as expected). Meanwhile, the file "response-utf8.test.tmp-£.a" appears to be named as such when viewed in the UI or via PowerShell's or ls's directory listing. The only issue I found was when lit printed the commands, the output got corrupted slightly by the '£' character, but that's not really an issue with the test, I think. I think the response-utf8.test looks good to me.

I haven't looked at the rest of the code. @MaskRay, if there is anything else you'd like me to take a look at, please let me know.

The tests are good, and I don't have more questions about Windows. But please mark all addresses issues "done". Click the "Done" button before uploading a Differential. When the Differential is uploaded, all marked comments will be marked "Done".

llvm/tools/llvm-ar/llvm-ar.cpp
1117

removeHyphens is only used once. Consider inlining it.

I feel a bit strange about removeHyphens. Does it accept an option without a dash?

gbreynoo marked 23 inline comments as done.Feb 14 2020, 5:37 AM
gbreynoo added inline comments.
llvm/tools/llvm-ar/llvm-ar.cpp
1117

I have inlined the function..

gbreynoo updated this revision to Diff 244633.Feb 14 2020, 5:40 AM
gbreynoo marked an inline comment as done.

Inlined removeHyphens function

gbreynoo marked an inline comment as done.Feb 14 2020, 5:45 AM

Thanks Maskray for your comment regarding marking comments as "done", I'll look out for that in future.

llvm/tools/llvm-ar/llvm-ar.cpp
1117

Regarding the your question, the removeHyphens function would accept options without a dash and return the same value.

MaskRay added inline comments.Feb 21 2020, 9:30 AM
llvm/test/tools/llvm-ar/response.test
24

There is no positive --rsp-quoting= test, so it is hard to tell whether the feature works.

llvm/tools/llvm-ar/llvm-ar.cpp
1102

Unused

1121

Three-dash ---rsp-quoting will be accidentally accepted.

1137

SmallVector -> ArrayRef

1165

I think we may need some refactoring first. Let me check if the parsing logic can be simplified.

gbreynoo marked 4 inline comments as done.Feb 24 2020, 7:41 AM
gbreynoo updated this revision to Diff 246215.Feb 24 2020, 7:47 AM
  • Removed unused function removeHyphens
  • Fixed matchFlagWithArg so 3 hyphens are no longer excepted before an argument
  • getRspQuoting now uses ArraryRef instead of SmallVector
gbreynoo marked an inline comment as done.Feb 24 2020, 7:50 AM
gbreynoo added inline comments.
llvm/tools/llvm-ar/llvm-ar.cpp
1165

By moving to iterators my hope was this area would be easier to read, however it is still not ideal. Any suggestions for further refactoring would be appreciated.

MaskRay accepted this revision.Mar 2 2020, 9:04 AM
This revision is now accepted and ready to land.Mar 2 2020, 9:04 AM
This revision was automatically updated to reflect the committed changes.
sbc100 added a subscriber: sbc100.Mar 3 2020, 11:47 AM

It looks like there is some inconsistency about what default to use.

clang seems to default to GNU style, even when running on windows: https://github.com/llvm/llvm-project/blob/master/clang/tools/driver/driver.cpp#L386
This can be overridden by --driver-mode=cl or --rsp-quoting=windows, but the default with no arguments seems to be GNU.

The wasm version of lld (which I work on) also defaults to GNU style (https://github.com/llvm/llvm-project/blob/9989b859efccafacb0cc1f8d393d8b9fc49f4037/lld/wasm/Driver.cpp#L168) although the other lld flavors have logic that matches that in this change.

Do wasm-ld and clang need updating too?

ruiu added a comment.Mar 4 2020, 1:05 AM

lld's default choices are different among platforms as we are trying to match the native linkers' behaviors. So is true to clang; if you invoke clang as clang-cl (which is a Microsoft-compatible compiler driver), the value is set to Windows by default. I think we didn't have any other policy than this, as we are creating drop-in replacements.

As to wasm-ld, I don't think you need to care about compatibility, as this is the first (and perhaps the only?) linker for wasm. So I would stick with GNU-style quoting instead of make it vary depending on platform to keep it simple.

gbreynoo added a comment.EditedMar 4 2020, 1:12 AM

Hi sbc100, thanks for highlighting this inconsistency. I think ideally for the sake of consistency, wasm-ld and clang would be updated too. I wonder if it would be worth the effort to have some kind of shared implementation for this?

Regarding Rui's point, I think there is a balance with giving Windows users the command line behaviour they expect. Is it least surprising to them that there is difference in default behaviour between platforms or that the default behaviour for Windows is not Windows style?

sbc100 added a comment.Mar 4 2020, 4:40 AM

This issue we in emscripten is really consistency. We have a compiler driver that invokes clang, llvm-ar, llvm-nm, and wasm-ld. We have logic that uses response files for long command lines. I think it's reasonable to expect that all the tools should have the same default, no? Otherwise each subprocess callsite needs to encode the rsp style .. per tool and per platform.

ruiu added a comment.Mar 5 2020, 11:26 PM

Yeah, that's another type of consistency, and if you feel that that consistency is more important, feel free to make a change to wasm-ld so that the tool defaults to runtime OS's rsp-quoting style.