This is an archive of the discontinued LLVM Phabricator instance.

[Support] Avoid calling CommandLineToArgvW from shell32.dll
ClosedPublic

Authored by rnk on Sep 11 2018, 10:38 AM.

Details

Summary

Shell32.dll depends on gdi32.dll and user32.dll, which are mostly DLLs
for Windows GUI functionality. LLVM's utilities don't typically need GUI
functionality, and loading these DLLs seems to be slowing down startup.
Also, we already have an implementation of Windows command line
tokenization in cl::TokenizeWindowsCommandLine, so we can just use it.

The goal is to get the original argv in UTF-8, so that it can pass
through most LLVM string APIs. A Windows process starts life with a
UTF-16 string for its command line, and it can be retreived with
GetCommandLineW from kernel32.dll.

Previously, we would:

  1. Get the wide command line
  2. Call CommandLineToArgvW to handle quoting rules and separate it into arguments.
  3. For each wide argument, expand wildcards (* and ?) using FindFirstFileW.
  4. Convert each argument to UTF-8

Now we:

  1. Get the wide command line, convert the whole thing to UTF-8
  2. Tokenize the UTF-8 command line with cl::TokenizeWindowsCommandLine
  3. For each argument, expand wildcards if present
    • This requires converting back to UTF-16 to call FindFirstFileW
    • Results of FindFirstFileW must be converted back to UTF-8

Diff Detail

Repository
rL LLVM

Event Timeline

rnk created this revision.Sep 11 2018, 10:38 AM

When you commit this, make sure to include in the commit message how much speedup you're seeing as a result of this.

llvm/lib/Support/Windows/Process.inc
147 ↗(On Diff #164927)

Can you make the first arg a StringRef?

150 ↗(On Diff #164927)

EC?

155 ↗(On Diff #164927)

Then you can write this as if (Arg.find_first_of("*?") == StringRef::npos || Arg == "/?" || Arg == "-?")

(one could argue we should have a function called StringRef::contains_any_of(StringRef Chars)). In any case, even though it's a bit longer I find the additional clarity from meaningful function names better than cryptic names like strpbrk which I have to look up on cppreference every time I see it.

179 ↗(On Diff #164927)

This can just be a SmallString<MAX_PATH>

185 ↗(On Diff #164927)

I think this can just be a SmallString<MAX_PATH>

192 ↗(On Diff #164927)

And you can just write append(Dir, FileName);

rnk added a comment.Sep 11 2018, 1:01 PM

When you commit this, make sure to include in the commit message how much speedup you're seeing as a result of this.

There is no speedup without adding /delayload:shell32.dll, which will be a separate patch.

rnk updated this revision to Diff 164956.Sep 11 2018, 1:07 PM
rnk marked 6 inline comments as done.
  • Be more string-y
zturner accepted this revision.Sep 11 2018, 1:12 PM
This revision is now accepted and ready to land.Sep 11 2018, 1:12 PM
This revision was automatically updated to reflect the committed changes.
rnk added a comment.Sep 11 2018, 2:29 PM

Unfortunately, it looks like this changed the behavior of newlines embedded in arguments. When parsing response files, we want to treat \n as whitespace. When parsing a command line, CommandLineToArgvW apparently does not. The documentation says it separates arguments on "whitespace characters", but it doesn't say what it considers to be whitespace. Apparently \n is not whitespace.

In the meantime, I made LLVM quote newlines in arguments to fix the clang tests that relied on this behavior in rL341992.