This is an archive of the discontinued LLVM Phabricator instance.

Windows: Add support for unicode command lines
ClosedPublic

Authored by majnemer on Oct 4 2013, 5:14 AM.

Details

Summary

The MSVCRT deliberately sends main() code-page specific characters.
This isn't too useful to LLVM as we end up converting the arguments to
UTF-16 and subsequently attempt to use the result as, for example, a
file name. Instead, we need to have the ability to access the Unicode
command line and transform it to UTF-8.

This has the distinct advantage over using the MSVC-specific wmain()
function as our entry point because:

  • It doesn't work on cygwin.
  • It only work on MinGW with caveats and only then on certain versions.
  • We get to keep our entry point as main(). :)

N.B. This patch includes fixes to other parts of lib/Support/Windows
s.t. we would be able to take advantage of getting the Unicode paths.
E.G. clang spawning clang -cc1 would want to give it Unicode arguments.

Diff Detail

Event Timeline

rnk added inline comments.Oct 4 2013, 6:54 AM
lib/Support/Windows/Process.inc
180–183

It seems silly to return the new args twice, once by return value and once by out param. Maybe it would be better to return an error_code?

193

Rather than doing this dance with StringOffsets to keep stable pointers into a std::vector, you could use a caller-allocated BumpPtrAllocator to allocate the strings. That way you get stable strings by default.

lib/Support/Windows/Program.inc
50

I'd rather not loop with goto. There are a couple easy ways to rewrite this with a loop construct.

lib/Support/Windows/Windows.h
27

You should IWYU. This header doesn't use either of these, does it?

majnemer added inline comments.Oct 4 2013, 7:45 AM
lib/Support/Windows/Process.inc
180–183

The intent is not to provide the caller with meaningful state in the last two parameters, they may not be used by an implementation; for example, the POSIX implementation doesn't use the second and third arguments at all, it just returns the first parameter.

I suppose I could use ErrorOr<> instead of Optional<>.

193

This can be done.

lib/Support/Windows/Program.inc
50

What's wrong with the goto? I don't see an issue. Would the other ways result in more or less code?

majnemer added inline comments.Oct 4 2013, 8:19 AM
lib/Support/Windows/Process.inc
193

Does the following interface seem reasonable to you?

error_code
GetArgumentVector(SmallVectorImpl<const char *> &NewArgs,
                  ArrayRef<const char *> ArgsFromMain,
                  SpecificBumpPtrAllocator<char> &ArgAllocator);
ruiu added inline comments.Oct 4 2013, 8:33 AM
include/llvm/Support/Process.h
176

In Unix this function does nothing and does not use any argument. In Windows the first argument (OriginalMainArgs) is ignored. It feels like they are really different, and they don't not necessarily have to have the same interface.

I'd probably rather make this a function defined only when LLVM_ON_WIN32 is defined, and guard the call of this function in LLVM's main() with LLVM_ON_WIN32.

lib/Support/Windows/Program.inc
50

This can easily be re-written with for(;;) { ...; if (len <= buffer.capacity()) break; ...}. It seems that's more normal way to write this kind of logic than goto.

rnk added inline comments.Oct 4 2013, 10:08 AM
include/llvm/Support/Process.h
176

I'd rather avoid ifdefs in main.

lib/Support/Windows/Process.inc
193

Looks good.

If we wanted to get even more minimal, we could use a MutableArrayRef<> to modify the original argv array in place! =D If the arg counts don't match we'd return an error and leave the array untouched.

lib/Support/Windows/Program.inc
50

I was thinking:

DWORD len = progNameUnicode.capacity();
do {
  progNameUnicode.resize(len);  // should be a nop on the first iteration.
  len = SearchPathW(...);
} while (len > buffer.capacity());
ruiu added inline comments.Oct 4 2013, 10:16 AM
include/llvm/Support/Process.h
176

I thought the same thing first, but found that we already have an #ifdef _WIN32 in main.

majnemer updated this revision to Unknown Object (????).Oct 6 2013, 5:01 AM

Address review comments.

rnk accepted this revision.Oct 6 2013, 8:11 AM

LGTM

ygao added a comment.Oct 6 2013, 11:39 AM

In lib/Support/Windows/Signals.inc,
+ HMODULE hLib = ::LoadLibrary(TEXT("Dbghelp.dll"));

Will LoadLibrary accept unicode file names, or do you have to use LoadLibraryW instead?

majnemer closed this revision.Oct 6 2013, 1:29 PM