This is an archive of the discontinued LLVM Phabricator instance.

Add writeFileWithSystemEncoding to LibLLVMSupport
ClosedPublic

Authored by rafaelauler on Aug 13 2014, 8:07 PM.

Download Raw Diff

Details

Reviewers

Summary

This patch adds to LLVMSupport the capability of writing files with international characters encoded in the current system encoding. This is relevant for Windows, where we can either use UTF16 or the current code page (the legacy Windows international characters). On UNIX, the file is always saved in UTF8.

This patch also fixes a bug in the Unix version of argumentsFitWithinSystemLimits(). Both functions will be used in a patch for clang to thoroughly support response files creation when calling other tools, addressing PR15171. On Windows, to correctly support internationalization, we need the ability to write response files both in UTF16 or the current code page, depending on the tool we will call. GCC for mingw, for instance, requires files to be encoded in the current code page. MSVC tools requires files to be encoded in UTF16.

Diff Detail

Event Timeline

rafaelauler updated this revision to Diff 12481.Aug 13 2014, 8:07 PM

rafaelauler retitled this revision from to Add writeFileWithSystemEncoding to LibLLVMSupport.

rafaelauler updated this object.

rafaelauler edited the test plan for this revision. (Show Details)

rafaelauler added a reviewer: • rafael.

rafaelauler added a subscriber: Unknown Object (MLST).

The functionality introduced in this patch will be used by http://reviews.llvm.org/D4897

rnk added a subscriber: rnk.Aug 14 2014, 4:59 PM

Does gcc ignore things like UTF BOMs in response files? Clang sniffs for the UTF-16 BOM when parsing response files on Windows. The best case would be that we give gcc a UTF-8 response file with a BOM, IMO.

Hi Rafael and Reid,

Thanks for sharing your opinion, I appreciate it. I organized a table with the testings I made in my Windows system. I encoded a response file with international characters in different encodings and tested them on different tools. Here are my findings:

Tool	UTF8-no-BOM	UTF8-BOM	UTF16-BOM	Current Code Page (ISO-8859-1 in my system)
GCC 4.8.1 MinGW	Fail	Fail	Fail	Works
LD 2.24 MinGW	Fail	Fail	Fail	Works
GCC 4.8.3 Cygwin	Works	Fail	Fail	Fail
LD 2.24.51 Cygwin	Works	Fail	Fail	Fail

For Cygwin, I used bash and, for MinGW programs, the Windows command prompt.

This led me to believe that:

GNU tools on Cygwin or any UNIX system accepts plain UTF8 without any BOM. Using BOM will confuse the tool. No other encoding is understood.
GNU tools on MinGW only accepts the current code page of the system. Using any other encoding, with or without BOM, is not understood.

That's why I designed my patch the way it is. On Windows native or MinGW, it uses current CP or UTF16 with BOM (for MSVC tools). On UNIX (including cygwin), it always uses UTF8 without BOM.

I supposed that all GNU tools work in this way and extended the information on all Clang Tool objects related to GNU to follow this as well. This is the meaning of using the enum member ResponseFileSupport::FullWithoutUTF16 in all GNU tools (no UTF16 means that it will use UTF8 on UNIX and Current code page on Windows).

I will update the comments in this patch to make this clear. I will also open a bug in binutils requesting them to implement UTF8/UTF16 response files on Windows/MinGW.

Best regards,
Rafael Auler

In this new patch, I implemented Rafael's suggestion of writing an encoding enum. Thus, I changed the last parameter of writeFileWithEncoding to be an encoding enum member.

Looks pretty good. Let me know if you need help landing it when you send a revised patch.

include/llvm/Support/Program.h
142	The contents parameter should be a StringRef. Most callers will have the length handy.
lib/Support/Unix/Program.inc
461–465	How about returning a std::error_code and using std::errc::io_error here and in the Windows implementation?
lib/Support/Windows/Program.inc
465–468	Similarly, we can just propagate ec here if we return std::error_code.
482	ditto
unittests/Support/ProgramTest.cpp
275	Hm, looks like we can't convert from SmallString to Twine. Twine is usually an implementation detail. Can you call .str() here instead?

Only a nit in addition to what Reid noticed. LGTM otherwise.

include/llvm/Support/Program.h
141	Please include a quick summary of your table. At least say that what requires us to use EM_CurrentCodePage is that gnu tools (ld, and gcc at least) on mingw only work with the current code page. BTW, have you tested with mingw-w64 too? If they support UTF8 or UTF16 and the old mingw does not that seems like a reason to push for a switch at some point.

Just one extra nit in addition to what Reid noticed. LGTM otherwise.

Thanks for your suggestions, I will submit a revised patch soon. Rafael, regarding your comment, I didn't add a table, but I did add a comment describing which encoding each tool use in the clang-side of the patch at http://reviews.llvm.org/D4897, file include/clang/Driver/Tool.h. However, if you want, I can add the table here too. What do you think?

Implemented rnk's and rafael's suggestions. Will update the clang part to D4897.

Rafael, I finished debugging our arabic test case. I tested a filename with the character 0xd6 and it worked OK just using regular cp 720. My initial command line test was failing because I was using "touch.exe" to create the file, and it was creating the file with the wrong name (the response file was OK, but the filename used a different character).

Almost there. Thanks for testing all the strange codepage combinations!

include/llvm/Support/Program.h
146	Please don't pass in the ErrMsg string. The caller can use EC.message() method. That is an old API design we try to avoid. I see it is used because raw_fd_osstream was never updated to avoid it :-( Suggestion: pass in a file descriptor. That way you can use a raw_fd_ostream with a saner interface and we don't spread the use of std::string pointers for error messages to other APIs. Another option: keep the filename, but use openFileForWrite + the fd raw_fd_ostream constructor.
lib/Support/Unix/Program.inc
469	This bug fix could be in another patch, no?
lib/Support/Windows/Program.inc
171	This refactoring could be in another patch, no?

Hi Rafael,

Thanks for this extra round of review, I answered your questions below. I will send an updated patch shortly.

include/llvm/Support/Program.h
146	I will use your updated constructor from r216293, thanks for updating it!
lib/Support/Unix/Program.inc
469	Ok! Sent it in http://reviews.llvm.org/D5053
lib/Support/Windows/Program.inc
171	Ok! Sent it in http://reviews.llvm.org/D5054

Patch addressing reviewers' concerns. I uploaded the clang side in D4897, currently being reviewed by Sean.

Added the struct "EncodingStrategy", used to represent how the user wants to encode her file in different OSes. This was used to allow the removal of several ifdefs in the clang side of the patch at D4897.

• rafael added inline comments.Aug 26 2014, 7:41 AM

include/llvm/Support/Program.h
140	When is the UnixEncoding not UTF8? If the problem was just the assert, I would suggest just writing the Unix version as std::error_code llvm::sys::writeFileWithEncoding(const char FileName, StringRef Contents, EncodingStrategy /ignored*/) { and documenting UTF8 is always used on Unix. You can even name enum WindowsEncodingMethod to make it explicit that is why we use on Windows.
154	FileName can be a StringRef now, no?

Introduce "WindowsEncodingMethod" to stress that the file encoding is only relevant on Windows systems. Remove the old "EncodingStrategy", addressing Rafael's concerns.

I don't know why, but phabricator didn't send the email here updating this
thread. Anyway, I sent a new patch addressing your concerns. It is
available at http://reviews.llvm.org/D4896.

Description:

Introduce "WindowsEncodingMethod" to stress that the file encoding is only
relevant on Windows systems. Remove the old "EncodingStrategy", addressing
Rafael's concerns.

LGTM

This revision is now accepted and ready to land.Aug 26 2014, 2:15 PM

I realize this patch has been accepted, but I am updating it in light of the discussion in D4897, to update the LLVM side of it.

I will copy and paste the description that I put in D4897, since it reflects the modifications made on both sides (LLVM and Clang):

This update refactors the new code in Job.cpp to use streams, no longer building and managing its own char buffers. Thanks to Sean’s suggestion, the code now is much more simple. However, I wanted to build a stream that could directly write to the response file with the correct encoding, but raw_fd_ostream lacks the capability to write files with different encodings. Therefore, I added this capability to raw_fd_ostream with a small modification and introduced a new constructor variant that creates buffers that, when flushed to a file, is written in a different encoding. If you think it is inadequate to have such a feature in raw_fd_ostream, I can work on creating my own stream class to do so.

I also refactored part of the write function in raw_fd_ostream out of this class, to avoid duplicating code in this implementation. Then, I changed by writeFileWithEncoding() function to write in a given file descriptor, which is assumed to be opened, rather than opening the file by my own and then closing it. This enabled me to make raw_fd_ostream work with different encodings with little extra code.

I also refactored part of the write function in raw_fd_ostream out of this class, to avoid duplicating code in this implementation.

Yes, this class is way too central to get an extra feature just for
response files.

You can maybe add a new streamer class, but I must say I am not sure
it is worth it. Response files are relatively small, so I would
probably start with what you had before: build a buffer and then dump
it. This also avoids having to handle incomplete characters
(getIncompleteUTFBytes)

Cheers,
Rafael

Hi Rafael,

No problem, I will just revert it to my last patch and rebase. This is the updated patch. In the Clang side, I will just use a raw_string_ostream and only call writeFileWithEncoding when the full buffer is written.

Best regards,
Rafael Auler

Do you need me to commit this?

Yes, please.

r217068

Revision Contents

Path

Size

include/

llvm/

Support/

Program.h

34 lines

lib/

Support/

Unix/

Program.inc

18 lines

Windows/

Path.inc

20 lines

Program.inc

46 lines

WindowsSupport.h

3 lines

unittests/

Support/

ProgramTest.cpp

50 lines

Diff 13190

include/llvm/Support/Program.h

Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	#endif
ExecuteNoWait(StringRef Program, const char args, const char env = nullptr,		ExecuteNoWait(StringRef Program, const char args, const char env = nullptr,
const StringRef **redirects = nullptr, unsigned memoryLimit = 0,		const StringRef **redirects = nullptr, unsigned memoryLimit = 0,
std::string ErrMsg = nullptr, bool ExecutionFailed = nullptr);		std::string ErrMsg = nullptr, bool ExecutionFailed = nullptr);

/// Return true if the given arguments fit within system-specific		/// Return true if the given arguments fit within system-specific
/// argument length limits.		/// argument length limits.
bool argumentsFitWithinSystemLimits(ArrayRef<const char*> Args);		bool argumentsFitWithinSystemLimits(ArrayRef<const char*> Args);

		/// File encoding options when writing contents that a non-UTF8 tool will
		/// read (on Windows systems). For UNIX, we always use UTF-8.
		enum WindowsEncodingMethod {
		/// UTF-8 is the LLVM native encoding, being the same as "do not perform
		/// encoding conversion".
		WEM_UTF8,
		WEM_CurrentCodePage,
		WEM_UTF16
		};

		/// Saves the UTF8-encoded \p contents string into the file \p FileName
		/// using a specific encoding.
		rafaelUnsubmitted Not Done Reply Inline Actions When is the UnixEncoding not UTF8? If the problem was just the assert, I would suggest just writing the Unix version as std::error_code llvm::sys::writeFileWithEncoding(const char FileName, StringRef Contents, EncodingStrategy /ignored/) { and documenting UTF8 is always used on Unix. You can even name enum WindowsEncodingMethod to make it explicit that is why we use on Windows. rafael:* When is the UnixEncoding not UTF8? If the problem was just the assert, I would suggest just…
		///
		rafaelUnsubmitted Not Done Reply Inline Actions Please include a quick summary of your table. At least say that what requires us to use EM_CurrentCodePage is that gnu tools (ld, and gcc at least) on mingw only work with the current code page. BTW, have you tested with mingw-w64 too? If they support UTF8 or UTF16 and the old mingw does not that seems like a reason to push for a switch at some point. rafael: Please include a quick summary of your table. At least say that what requires us to use…
		/// This write file function adds the possibility to choose which encoding
		rnkUnsubmitted Not Done Reply Inline Actions The contents parameter should be a StringRef. Most callers will have the length handy. rnk: The contents parameter should be a StringRef. Most callers will have the length handy.
		/// to use when writing a text file. On Windows, this is important when
		/// writing files with internationalization support with an encoding that is
		/// different from the one used in LLVM (UTF-8). We use this when writing
		/// response files, since GCC tools on MinGW only understand legacy code
		rafaelUnsubmitted Not Done Reply Inline Actions Please don't pass in the ErrMsg string. The caller can use EC.message() method. That is an old API design we try to avoid. I see it is used because raw_fd_osstream was never updated to avoid it :-( Suggestion: pass in a file descriptor. That way you can use a raw_fd_ostream with a saner interface and we don't spread the use of std::string pointers for error messages to other APIs. Another option: keep the filename, but use openFileForWrite + the fd raw_fd_ostream constructor. rafael: Please don't pass in the ErrMsg string. The caller can use EC.message() method. That is an old…
		rafaelaulerAuthorUnsubmitted Not Done Reply Inline Actions I will use your updated constructor from r216293, thanks for updating it! rafaelauler: I will use your updated constructor from r216293, thanks for updating it!
		/// pages, and VisualStudio tools only understand UTF-16.
		/// For UNIX, using different encodings is silently ignored, since all tools
		/// work well with UTF-8.
		/// This function assumes that you only use UTF-8 text data and will convert
		/// it to your desired encoding before writing to the file.
		///
		/// FIXME: We use EM_CurrentCodePage to write response files for GNU tools in
		/// a MinGW/MinGW-w64 environment, which has serious flaws but currently is
		rafaelUnsubmitted Not Done Reply Inline Actions FileName can be a StringRef now, no? rafael: FileName can be a StringRef now, no?
		/// our best shot to make gcc/ld understand international characters. This
		/// should be changed as soon as binutils fix this to support UTF16 on mingw.
		///
		/// \returns non-zero error_code if failed
		std::error_code
		writeFileWithEncoding(StringRef FileName, StringRef Contents,
		WindowsEncodingMethod Encoding = WEM_UTF8);

/// This function waits for the process specified by \p PI to finish.		/// This function waits for the process specified by \p PI to finish.
/// \returns A \see ProcessInfo struct with Pid set to:		/// \returns A \see ProcessInfo struct with Pid set to:
/// \li The process id of the child process if the child process has changed		/// \li The process id of the child process if the child process has changed
/// state.		/// state.
/// \li 0 if the child process has not changed state.		/// \li 0 if the child process has not changed state.
/// \note Users of this function should always check the ReturnCode member of		/// \note Users of this function should always check the ReturnCode member of
/// the \see ProcessInfo returned from this function.		/// the \see ProcessInfo returned from this function.
ProcessInfo Wait(		ProcessInfo Wait(
Show All 16 Lines

lib/Support/Unix/Program.inc

Show All 13 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//=== WARNING: Implementation here must contain only generic UNIX code that		//=== WARNING: Implementation here must contain only generic UNIX code that
//=== is guaranteed to work on all UNIX variants.		//=== is guaranteed to work on all UNIX variants.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "Unix.h"		#include "Unix.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
		#include "llvm/Support/raw_ostream.h"
#include <llvm/Config/config.h>		#include <llvm/Config/config.h>
#if HAVE_SYS_STAT_H		#if HAVE_SYS_STAT_H
#include <sys/stat.h>		#include <sys/stat.h>
#endif		#endif
#if HAVE_SYS_RESOURCE_H		#if HAVE_SYS_RESOURCE_H
#include <sys/resource.h>		#include <sys/resource.h>
#endif		#endif
#if HAVE_SIGNAL_H		#if HAVE_SIGNAL_H
▲ Show 20 Lines • Show All 405 Lines • ▼ Show 20 Lines	// Do nothing, as Unix doesn't differentiate between text and binary.
return std::error_code();		return std::error_code();
}		}

std::error_code sys::ChangeStdoutToBinary(){		std::error_code sys::ChangeStdoutToBinary(){
// Do nothing, as Unix doesn't differentiate between text and binary.		// Do nothing, as Unix doesn't differentiate between text and binary.
return std::error_code();		return std::error_code();
}		}

		std::error_code
		llvm::sys::writeFileWithEncoding(StringRef FileName, StringRef Contents,
		WindowsEncodingMethod Encoding /unused/) {
		std::error_code EC;
		llvm::raw_fd_ostream OS(FileName, EC, llvm::sys::fs::OpenFlags::F_Text);

		if (EC)
		return EC;

		OS << Contents;

		if (OS.has_error())
		return std::make_error_code(std::errc::io_error);

		return EC;
		}

bool llvm::sys::argumentsFitWithinSystemLimits(ArrayRef<const char*> Args) {		bool llvm::sys::argumentsFitWithinSystemLimits(ArrayRef<const char*> Args) {
static long ArgMax = sysconf(_SC_ARG_MAX);		static long ArgMax = sysconf(_SC_ARG_MAX);

// System says no practical limit.		// System says no practical limit.
if (ArgMax == -1)		if (ArgMax == -1)
		rnkUnsubmitted Not Done Reply Inline Actions How about returning a std::error_code and using std::errc::io_error here and in the Windows implementation? rnk: How about returning a std::error_code and using std::errc::io_error here and in the Windows…
return true;		return true;

// Conservatively account for space required by environment variables.		// Conservatively account for space required by environment variables.
long HalfArgMax = ArgMax / 2;		long HalfArgMax = ArgMax / 2;
		rafaelUnsubmitted Not Done Reply Inline Actions This bug fix could be in another patch, no? rafael: This bug fix could be in another patch, no?
		rafaelaulerAuthorUnsubmitted Not Done Reply Inline Actions Ok! Sent it in http://reviews.llvm.org/D5053 rafaelauler: Ok! Sent it in http://reviews.llvm.org/D5053

size_t ArgLength = 0;		size_t ArgLength = 0;
for (ArrayRef<const char*>::iterator I = Args.begin(), E = Args.end();		for (ArrayRef<const char*>::iterator I = Args.begin(), E = Args.end();
I != E; ++I) {		I != E; ++I) {
ArgLength += strlen(*I) + 1;		ArgLength += strlen(*I) + 1;
if (ArgLength > size_t(HalfArgMax)) {		if (ArgLength > size_t(HalfArgMax)) {
return false;		return false;
}		}
}		}
return true;		return true;
}		}
}		}

lib/Support/Windows/Path.inc

Show First 20 Lines • Show All 913 Lines • ▼ Show 20 Lines	std::error_code UTF8ToUTF16(llvm::StringRef utf8,

// Make utf16 null terminated.		// Make utf16 null terminated.
utf16.push_back(0);		utf16.push_back(0);
utf16.pop_back();		utf16.pop_back();

return std::error_code();		return std::error_code();
}		}

std::error_code UTF16ToUTF8(const wchar_t *utf16, size_t utf16_len,		static
		std::error_code UTF16ToCodePage(unsigned codepage, const wchar_t *utf16,
		size_t utf16_len,
llvm::SmallVectorImpl<char> &utf8) {		llvm::SmallVectorImpl<char> &utf8) {
if (utf16_len) {		if (utf16_len) {
// Get length.		// Get length.
int len = ::WideCharToMultiByte(CP_UTF8, 0, utf16, utf16_len, utf8.begin(),		int len = ::WideCharToMultiByte(codepage, 0, utf16, utf16_len, utf8.begin(),
0, NULL, NULL);		0, NULL, NULL);

if (len == 0)		if (len == 0)
return windows_error(::GetLastError());		return windows_error(::GetLastError());

utf8.reserve(len);		utf8.reserve(len);
utf8.set_size(len);		utf8.set_size(len);

// Now do the actual conversion.		// Now do the actual conversion.
len = ::WideCharToMultiByte(CP_UTF8, 0, utf16, utf16_len, utf8.data(),		len = ::WideCharToMultiByte(codepage, 0, utf16, utf16_len, utf8.data(),
utf8.size(), NULL, NULL);		utf8.size(), NULL, NULL);

if (len == 0)		if (len == 0)
return windows_error(::GetLastError());		return windows_error(::GetLastError());
}		}

// Make utf8 null terminated.		// Make utf8 null terminated.
utf8.push_back(0);		utf8.push_back(0);
utf8.pop_back();		utf8.pop_back();

return std::error_code();		return std::error_code();
}		}

		std::error_code UTF16ToUTF8(const wchar_t *utf16, size_t utf16_len,
		llvm::SmallVectorImpl<char> &utf8) {
		return UTF16ToCodePage(CP_UTF8, utf16, utf16_len, utf8);
		}

		std::error_code UTF16ToCurCP(const wchar_t *utf16, size_t utf16_len,
		llvm::SmallVectorImpl<char> &utf8) {
		return UTF16ToCodePage(CP_ACP, utf16, utf16_len, utf8);
		}
} // end namespace windows		} // end namespace windows
} // end namespace sys		} // end namespace sys
} // end namespace llvm		} // end namespace llvm

lib/Support/Windows/Program.inc

//===- Win32/Program.cpp - Win32 Program Implementation ------- -- C++ --===//		//===- Win32/Program.cpp - Win32 Program Implementation ------- -- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file provides the Win32 specific implementation of the Program class.		// This file provides the Win32 specific implementation of the Program class.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "WindowsSupport.h"		#include "WindowsSupport.h"
		#include "llvm/Support/ConvertUTF.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
		#include "llvm/Support/raw_ostream.h"
#include <cstdio>		#include <cstdio>
#include <fcntl.h>		#include <fcntl.h>
#include <io.h>		#include <io.h>
#include <malloc.h>		#include <malloc.h>

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//=== WARNING: Implementation here must contain only Win32 specific code		//=== WARNING: Implementation here must contain only Win32 specific code
//=== and must not be UNIX code		//=== and must not be UNIX code
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	if (Quoted) {
len += PrecedingEscapes + 1;		len += PrecedingEscapes + 1;
}		}

return len;		return len;
}		}

}		}

static std::unique_ptr<char[]> flattenArgs(const char **args) {		static std::unique_ptr<char[]> flattenArgs(const char **args) {
		rafaelUnsubmitted Not Done Reply Inline Actions This refactoring could be in another patch, no? rafael: This refactoring could be in another patch, no?
		rafaelaulerAuthorUnsubmitted Not Done Reply Inline Actions Ok! Sent it in http://reviews.llvm.org/D5054 rafaelauler: Ok! Sent it in http://reviews.llvm.org/D5054
// First, determine the length of the command line.		// First, determine the length of the command line.
unsigned len = 0;		unsigned len = 0;
for (unsigned i = 0; args[i]; i++) {		for (unsigned i = 0; args[i]; i++) {
len += ArgLenWithQuotes(args[i]) + 1;		len += ArgLenWithQuotes(args[i]) + 1;
}		}

// Now build the command line.		// Now build the command line.
std::unique_ptr<char[]> command(new char[len+1]);		std::unique_ptr<char[]> command(new char[len+1]);
▲ Show 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	ProcessInfo sys::Wait(const ProcessInfo &PI, unsigned SecondsToWait,

std::error_code sys::ChangeStdoutToBinary(){		std::error_code sys::ChangeStdoutToBinary(){
int result = _setmode( _fileno(stdout), _O_BINARY );		int result = _setmode( _fileno(stdout), _O_BINARY );
if (result == -1)		if (result == -1)
return std::error_code(errno, std::generic_category());		return std::error_code(errno, std::generic_category());
return std::error_code();		return std::error_code();
}		}

		std::error_code
		llvm::sys::writeFileWithEncoding(StringRef FileName, StringRef Contents,
		WindowsEncodingMethod Encoding) {
		std::error_code EC;
		llvm::raw_fd_ostream OS(FileName, EC, llvm::sys::fs::OpenFlags::F_Text);
		if (EC)
		return EC;

		if (Encoding == WEM_UTF8) {
		OS << Contents;
		} else if (Encoding == WEM_CurrentCodePage) {
		SmallVector<wchar_t, 1> ArgsUTF16;
		SmallVector<char, 1> ArgsCurCP;

		if ((EC = windows::UTF8ToUTF16(Contents, ArgsUTF16)))
		return EC;

		if ((EC = windows::UTF16ToCurCP(
		ArgsUTF16.data(), ArgsUTF16.size(), ArgsCurCP)))
		return EC;

		OS.write(ArgsCurCP.data(), ArgsCurCP.size());
		} else if (Encoding == WEM_UTF16) {
		SmallVector<wchar_t, 1> ArgsUTF16;
		rnkUnsubmitted Not Done Reply Inline Actions Similarly, we can just propagate ec here if we return std::error_code. rnk: Similarly, we can just propagate ec here if we return std::error_code.

		if ((EC = windows::UTF8ToUTF16(Contents, ArgsUTF16)))
		return EC;

		// Endianness guessing
		char BOM[2];
		uint16_t src = UNI_UTF16_BYTE_ORDER_MARK_NATIVE;
		memcpy(BOM, &src, 2);
		OS.write(BOM, 2);
		OS.write((char *)ArgsUTF16.data(), ArgsUTF16.size() << 1);
		} else {
		llvm_unreachable("Unknown encoding");
		}

		rnkUnsubmitted Not Done Reply Inline Actions ditto rnk: ditto
		if (OS.has_error())
		return std::make_error_code(std::errc::io_error);

		return EC;
		}

bool llvm::sys::argumentsFitWithinSystemLimits(ArrayRef<const char*> Args) {		bool llvm::sys::argumentsFitWithinSystemLimits(ArrayRef<const char*> Args) {
// The documented max length of the command line passed to CreateProcess.		// The documented max length of the command line passed to CreateProcess.
static const size_t MaxCommandStringLength = 32768;		static const size_t MaxCommandStringLength = 32768;
size_t ArgLength = 0;		size_t ArgLength = 0;
for (ArrayRef<const char*>::iterator I = Args.begin(), E = Args.end();		for (ArrayRef<const char*>::iterator I = Args.begin(), E = Args.end();
I != E; ++I) {		I != E; ++I) {
// Account for the trailing space for every arg but the last one and the		// Account for the trailing space for every arg but the last one and the
// trailing NULL of the last argument.		// trailing NULL of the last argument.
ArgLength += ArgLenWithQuotes(*I) + 1;		ArgLength += ArgLenWithQuotes(*I) + 1;
if (ArgLength > MaxCommandStringLength) {		if (ArgLength > MaxCommandStringLength) {
return false;		return false;
}		}
}		}
return true;		return true;
}		}
}		}

lib/Support/Windows/WindowsSupport.h

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	c_str(SmallVectorImpl<T> &str) {
return str.data();		return str.data();
}		}

namespace sys {		namespace sys {
namespace windows {		namespace windows {
std::error_code UTF8ToUTF16(StringRef utf8, SmallVectorImpl<wchar_t> &utf16);		std::error_code UTF8ToUTF16(StringRef utf8, SmallVectorImpl<wchar_t> &utf16);
std::error_code UTF16ToUTF8(const wchar_t *utf16, size_t utf16_len,		std::error_code UTF16ToUTF8(const wchar_t *utf16, size_t utf16_len,
SmallVectorImpl<char> &utf8);		SmallVectorImpl<char> &utf8);
		/// Convert from UTF16 to the current code page used in the system
		std::error_code UTF16ToCurCP(const wchar_t *utf16, size_t utf16_len,
		SmallVectorImpl<char> &utf8);
} // end namespace windows		} // end namespace windows
} // end namespace sys		} // end namespace sys
} // end namespace llvm.		} // end namespace llvm.

unittests/Support/ProgramTest.cpp

Show All 28 Lines
#include <windows.h>		#include <windows.h>
void sleep_for(unsigned int seconds) {		void sleep_for(unsigned int seconds) {
Sleep(seconds * 1000);		Sleep(seconds * 1000);
}		}
#else		#else
#error sleep_for is not implemented on your platform.		#error sleep_for is not implemented on your platform.
#endif		#endif

		#define ASSERT_NO_ERROR(x) \
		if (std::error_code ASSERT_NO_ERROR_ec = x) { \
		SmallString<128> MessageStorage; \
		raw_svector_ostream Message(MessageStorage); \
		Message << #x ": did not return errc::success.\n" \
		<< "error number: " << ASSERT_NO_ERROR_ec.value() << "\n" \
		<< "error message: " << ASSERT_NO_ERROR_ec.message() << "\n"; \
		GTEST_FATAL_FAILURE_(MessageStorage.c_str()); \
		} else { \
		}
// From TestMain.cpp.		// From TestMain.cpp.
extern const char *TestMainArgv0;		extern const char *TestMainArgv0;

namespace {		namespace {

using namespace llvm;		using namespace llvm;
using namespace sys;		using namespace sys;

▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	const char *argv[] = { Executable.c_str(), nullptr };
ASSERT_EQ(PI.Pid, 0)		ASSERT_EQ(PI.Pid, 0)
<< "On error ExecuteNoWait should return an invalid ProcessInfo";		<< "On error ExecuteNoWait should return an invalid ProcessInfo";
ASSERT_TRUE(ExecutionFailed);		ASSERT_TRUE(ExecutionFailed);
ASSERT_FALSE(Error.empty());		ASSERT_FALSE(Error.empty());
}		}

}		}

		#ifdef LLVM_ON_WIN32
		const char utf16le_text[] =
		"\x6c\x00\x69\x00\x6e\x00\x67\x00\xfc\x00\x69\x00\xe7\x00\x61\x00";
		const char utf16be_text[] =
		"\x00\x6c\x00\x69\x00\x6e\x00\x67\x00\xfc\x00\x69\x00\xe7\x00\x61";
		#endif
		const char utf8_text[] = "\x6c\x69\x6e\x67\xc3\xbc\x69\xc3\xa7\x61";

		TEST(ProgramTest, TestWriteWithSystemEncoding) {
		SmallString<128> TestDirectory;
		ASSERT_NO_ERROR(fs::createUniqueDirectory("program-test", TestDirectory));
		errs() << "Test Directory: " << TestDirectory << '\n';
		errs().flush();
		SmallString<128> file_pathname(TestDirectory);
		path::append(file_pathname, "international-file.txt");
		// Only on Windows we should encode in UTF16. For other systems, use UTF8
		ASSERT_NO_ERROR(sys::writeFileWithEncoding(file_pathname.c_str(), utf8_text,
		sys::WEM_UTF16));
		int fd = 0;
		ASSERT_NO_ERROR(fs::openFileForRead(file_pathname.c_str(), fd));
		#if defined(LLVM_ON_WIN32)
		char buf[18];
		ASSERT_EQ(::read(fd, buf, 18), 18);
		if (strncmp(buf, "\xfe\xff", 2) == 0) { // UTF16-BE
		ASSERT_EQ(strncmp(&buf[2], utf16be_text, 16), 0);
		} else if (strncmp(buf, "\xff\xfe", 2) == 0) { // UTF16-LE
		ASSERT_EQ(strncmp(&buf[2], utf16le_text, 16), 0);
		} else {
		FAIL() << "Invalid BOM in UTF-16 file";
		}
		#else
		char buf[10];
		ASSERT_EQ(::read(fd, buf, 10), 10);
		ASSERT_EQ(strncmp(buf, utf8_text, 10), 0);
		#endif
		::close(fd);
		ASSERT_NO_ERROR(fs::remove(file_pathname.str()));
		ASSERT_NO_ERROR(fs::remove(TestDirectory.str()));
		}

} // end anonymous namespace		} // end anonymous namespace
		rnkUnsubmitted Not Done Reply Inline Actions Hm, looks like we can't convert from SmallString to Twine. Twine is usually an implementation detail. Can you call .str() here instead? rnk: Hm, looks like we can't convert from SmallString to Twine. Twine is usually an implementation…

This is an archive of the discontinued LLVM Phabricator instance.

Add writeFileWithSystemEncoding to LibLLVMSupportClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 13190

include/llvm/Support/Program.h

lib/Support/Unix/Program.inc

lib/Support/Windows/Path.inc

lib/Support/Windows/Program.inc

lib/Support/Windows/WindowsSupport.h

unittests/Support/ProgramTest.cpp

Add writeFileWithSystemEncoding to LibLLVMSupport
ClosedPublic