This is an archive of the discontinued LLVM Phabricator instance.

Add writeFileWithSystemEncoding to LibLLVMSupport
ClosedPublic

Authored by rafaelauler on Aug 13 2014, 8:07 PM.

Download Raw Diff

Details

Reviewers

Summary

This patch adds to LLVMSupport the capability of writing files with international characters encoded in the current system encoding. This is relevant for Windows, where we can either use UTF16 or the current code page (the legacy Windows international characters). On UNIX, the file is always saved in UTF8.

This patch also fixes a bug in the Unix version of argumentsFitWithinSystemLimits(). Both functions will be used in a patch for clang to thoroughly support response files creation when calling other tools, addressing PR15171. On Windows, to correctly support internationalization, we need the ability to write response files both in UTF16 or the current code page, depending on the tool we will call. GCC for mingw, for instance, requires files to be encoded in the current code page. MSVC tools requires files to be encoded in UTF16.

Diff Detail

Event Timeline

rafaelauler updated this revision to Diff 12481.Aug 13 2014, 8:07 PM

rafaelauler retitled this revision from to Add writeFileWithSystemEncoding to LibLLVMSupport.

rafaelauler updated this object.

rafaelauler edited the test plan for this revision. (Show Details)

rafaelauler added a reviewer: • rafael.

rafaelauler added a subscriber: Unknown Object (MLST).

The functionality introduced in this patch will be used by http://reviews.llvm.org/D4897

rnk added a subscriber: rnk.Aug 14 2014, 4:59 PM

Does gcc ignore things like UTF BOMs in response files? Clang sniffs for the UTF-16 BOM when parsing response files on Windows. The best case would be that we give gcc a UTF-8 response file with a BOM, IMO.

Hi Rafael and Reid,

Thanks for sharing your opinion, I appreciate it. I organized a table with the testings I made in my Windows system. I encoded a response file with international characters in different encodings and tested them on different tools. Here are my findings:

Tool	UTF8-no-BOM	UTF8-BOM	UTF16-BOM	Current Code Page (ISO-8859-1 in my system)
GCC 4.8.1 MinGW	Fail	Fail	Fail	Works
LD 2.24 MinGW	Fail	Fail	Fail	Works
GCC 4.8.3 Cygwin	Works	Fail	Fail	Fail
LD 2.24.51 Cygwin	Works	Fail	Fail	Fail

For Cygwin, I used bash and, for MinGW programs, the Windows command prompt.

This led me to believe that:

GNU tools on Cygwin or any UNIX system accepts plain UTF8 without any BOM. Using BOM will confuse the tool. No other encoding is understood.
GNU tools on MinGW only accepts the current code page of the system. Using any other encoding, with or without BOM, is not understood.

That's why I designed my patch the way it is. On Windows native or MinGW, it uses current CP or UTF16 with BOM (for MSVC tools). On UNIX (including cygwin), it always uses UTF8 without BOM.

I supposed that all GNU tools work in this way and extended the information on all Clang Tool objects related to GNU to follow this as well. This is the meaning of using the enum member ResponseFileSupport::FullWithoutUTF16 in all GNU tools (no UTF16 means that it will use UTF8 on UNIX and Current code page on Windows).

I will update the comments in this patch to make this clear. I will also open a bug in binutils requesting them to implement UTF8/UTF16 response files on Windows/MinGW.

Best regards,
Rafael Auler

In this new patch, I implemented Rafael's suggestion of writing an encoding enum. Thus, I changed the last parameter of writeFileWithEncoding to be an encoding enum member.

Looks pretty good. Let me know if you need help landing it when you send a revised patch.

include/llvm/Support/Program.h
142 ↗	(On Diff #12579)	The contents parameter should be a StringRef. Most callers will have the length handy.
lib/Support/Unix/Program.inc
461–465 ↗	(On Diff #12579)	How about returning a std::error_code and using std::errc::io_error here and in the Windows implementation?
lib/Support/Windows/Program.inc
465–468 ↗	(On Diff #12579)	Similarly, we can just propagate ec here if we return std::error_code.
482 ↗	(On Diff #12579)	ditto
unittests/Support/ProgramTest.cpp
275 ↗	(On Diff #12579)	Hm, looks like we can't convert from SmallString to Twine. Twine is usually an implementation detail. Can you call .str() here instead?

Only a nit in addition to what Reid noticed. LGTM otherwise.

include/llvm/Support/Program.h
141 ↗	(On Diff #12579)	Please include a quick summary of your table. At least say that what requires us to use EM_CurrentCodePage is that gnu tools (ld, and gcc at least) on mingw only work with the current code page. BTW, have you tested with mingw-w64 too? If they support UTF8 or UTF16 and the old mingw does not that seems like a reason to push for a switch at some point.

Just one extra nit in addition to what Reid noticed. LGTM otherwise.

Thanks for your suggestions, I will submit a revised patch soon. Rafael, regarding your comment, I didn't add a table, but I did add a comment describing which encoding each tool use in the clang-side of the patch at http://reviews.llvm.org/D4897, file include/clang/Driver/Tool.h. However, if you want, I can add the table here too. What do you think?

Implemented rnk's and rafael's suggestions. Will update the clang part to D4897.

Rafael, I finished debugging our arabic test case. I tested a filename with the character 0xd6 and it worked OK just using regular cp 720. My initial command line test was failing because I was using "touch.exe" to create the file, and it was creating the file with the wrong name (the response file was OK, but the filename used a different character).

Almost there. Thanks for testing all the strange codepage combinations!

include/llvm/Support/Program.h
146 ↗	(On Diff #12872)	Please don't pass in the ErrMsg string. The caller can use EC.message() method. That is an old API design we try to avoid. I see it is used because raw_fd_osstream was never updated to avoid it :-( Suggestion: pass in a file descriptor. That way you can use a raw_fd_ostream with a saner interface and we don't spread the use of std::string pointers for error messages to other APIs. Another option: keep the filename, but use openFileForWrite + the fd raw_fd_ostream constructor.
lib/Support/Unix/Program.inc
479 ↗	(On Diff #12872)	This bug fix could be in another patch, no?
lib/Support/Windows/Program.inc
171 ↗	(On Diff #12872)	This refactoring could be in another patch, no?

Hi Rafael,

Thanks for this extra round of review, I answered your questions below. I will send an updated patch shortly.

include/llvm/Support/Program.h
146 ↗	(On Diff #12872)	I will use your updated constructor from r216293, thanks for updating it!
lib/Support/Unix/Program.inc
479 ↗	(On Diff #12872)	Ok! Sent it in http://reviews.llvm.org/D5053
lib/Support/Windows/Program.inc
171 ↗	(On Diff #12872)	Ok! Sent it in http://reviews.llvm.org/D5054

Patch addressing reviewers' concerns. I uploaded the clang side in D4897, currently being reviewed by Sean.

Added the struct "EncodingStrategy", used to represent how the user wants to encode her file in different OSes. This was used to allow the removal of several ifdefs in the clang side of the patch at D4897.

• rafael added inline comments.Aug 26 2014, 7:41 AM

include/llvm/Support/Program.h
140 ↗	(On Diff #12943)	When is the UnixEncoding not UTF8? If the problem was just the assert, I would suggest just writing the Unix version as std::error_code llvm::sys::writeFileWithEncoding(const char FileName, StringRef Contents, EncodingStrategy /ignored*/) { and documenting UTF8 is always used on Unix. You can even name enum WindowsEncodingMethod to make it explicit that is why we use on Windows.
154 ↗	(On Diff #12943)	FileName can be a StringRef now, no?

Introduce "WindowsEncodingMethod" to stress that the file encoding is only relevant on Windows systems. Remove the old "EncodingStrategy", addressing Rafael's concerns.

I don't know why, but phabricator didn't send the email here updating this
thread. Anyway, I sent a new patch addressing your concerns. It is
available at http://reviews.llvm.org/D4896.

Description:

Introduce "WindowsEncodingMethod" to stress that the file encoding is only
relevant on Windows systems. Remove the old "EncodingStrategy", addressing
Rafael's concerns.

LGTM

This revision is now accepted and ready to land.Aug 26 2014, 2:15 PM

I realize this patch has been accepted, but I am updating it in light of the discussion in D4897, to update the LLVM side of it.

I will copy and paste the description that I put in D4897, since it reflects the modifications made on both sides (LLVM and Clang):

This update refactors the new code in Job.cpp to use streams, no longer building and managing its own char buffers. Thanks to Sean’s suggestion, the code now is much more simple. However, I wanted to build a stream that could directly write to the response file with the correct encoding, but raw_fd_ostream lacks the capability to write files with different encodings. Therefore, I added this capability to raw_fd_ostream with a small modification and introduced a new constructor variant that creates buffers that, when flushed to a file, is written in a different encoding. If you think it is inadequate to have such a feature in raw_fd_ostream, I can work on creating my own stream class to do so.

I also refactored part of the write function in raw_fd_ostream out of this class, to avoid duplicating code in this implementation. Then, I changed by writeFileWithEncoding() function to write in a given file descriptor, which is assumed to be opened, rather than opening the file by my own and then closing it. This enabled me to make raw_fd_ostream work with different encodings with little extra code.

I also refactored part of the write function in raw_fd_ostream out of this class, to avoid duplicating code in this implementation.

Yes, this class is way too central to get an extra feature just for
response files.

You can maybe add a new streamer class, but I must say I am not sure
it is worth it. Response files are relatively small, so I would
probably start with what you had before: build a buffer and then dump
it. This also avoids having to handle incomplete characters
(getIncompleteUTFBytes)

Cheers,
Rafael

Hi Rafael,

No problem, I will just revert it to my last patch and rebase. This is the updated patch. In the Clang side, I will just use a raw_string_ostream and only call writeFileWithEncoding when the full buffer is written.

Best regards,
Rafael Auler

Do you need me to commit this?

Yes, please.

r217068

Revision Contents

Path

Size

include/

llvm/

Support/

FileSystem.h

28 lines

raw_ostream.h

26 lines

lib/

Support/

Path.cpp

49 lines

Unix/

Path.inc

9 lines

Windows/

Path.inc

63 lines

WindowsSupport.h

3 lines

raw_ostream.cpp

129 lines

unittests/

Support/

Path.cpp

51 lines

raw_ostream_test.cpp

64 lines

Diff 13115

include/llvm/Support/FileSystem.h

Show First 20 Lines • Show All 596 Lines • ▼ Show 20 Lines	inline OpenFlags &operator\|=(OpenFlags &A, OpenFlags B) {
return A;		return A;
}		}

std::error_code openFileForWrite(const Twine &Name, int &ResultFD,		std::error_code openFileForWrite(const Twine &Name, int &ResultFD,
OpenFlags Flags, unsigned Mode = 0666);		OpenFlags Flags, unsigned Mode = 0666);

std::error_code openFileForRead(const Twine &Name, int &ResultFD);		std::error_code openFileForRead(const Twine &Name, int &ResultFD);

		/// File encoding options when writing contents that a non-UTF8 tool will
		/// read (on Windows systems). For UNIX, we always use UTF-8.
		enum WindowsEncodingMethod : unsigned {
		/// UTF-8 is the LLVM native encoding, being the same as "do not perform
		/// encoding conversion".
		WEM_UTF8 = 0,
		WEM_CurrentCodePage = 1,
		WEM_UTF16 = 2
		};

		/// Saves the UTF8-encoded \p contents string into the file \p FileName
		/// using a specific encoding. This is necessary when writing files to
		/// some Windows tools that do not understand UTF-8, i.e., when generating
		/// response files that link.exe or gcc.exe will read.
		/// FIXME: We use WEM_CurrentCodePage to write response files for GNU tools in
		/// a MinGW/MinGW-w64 environment, which has serious flaws but currently is
		/// our best shot to make gcc/ld understand international characters. This
		/// should be changed as soon as binutils fix this to support UTF16 on mingw.
		/// \returns non-zero error_code if failed
		std::error_code writeFileWithEncoding(int FD, StringRef Contents,
		WindowsEncodingMethod Encoding,
		bool BeginOfFile, bool UseAtomicWrites);

		/// Loops calls to ::write() until the entire buffer is written, ignoring
		/// trivial errors.
		std::error_code writeBufferToFile(int FD, const char *Buf, ssize_t Size,
		bool UseAtomicWrites);

/// @brief Identify the type of a binary file based on how magical it is.		/// @brief Identify the type of a binary file based on how magical it is.
file_magic identify_magic(StringRef magic);		file_magic identify_magic(StringRef magic);

/// @brief Get and identify \a path's type based on its content.		/// @brief Get and identify \a path's type based on its content.
///		///
/// @param path Input path.		/// @param path Input path.
/// @param result Set to the type of file, or file_magic::unknown.		/// @param result Set to the type of file, or file_magic::unknown.
/// @returns errc::success if result has been successfully set, otherwise a		/// @returns errc::success if result has been successfully set, otherwise a
▲ Show 20 Lines • Show All 298 Lines • Show Last 20 Lines

include/llvm/Support/raw_ostream.h

Show All 21 Lines
namespace llvm {		namespace llvm {
class format_object_base;		class format_object_base;
template <typename T>		template <typename T>
class SmallVectorImpl;		class SmallVectorImpl;

namespace sys {		namespace sys {
namespace fs {		namespace fs {
enum OpenFlags : unsigned;		enum OpenFlags : unsigned;
		enum WindowsEncodingMethod : unsigned;
}		}
}		}

/// raw_ostream - This class implements an extremely fast bulk output stream		/// raw_ostream - This class implements an extremely fast bulk output stream
/// that can only output to a stream. It does not support seeking, reopening,		/// that can only output to a stream. It does not support seeking, reopening,
/// rewinding, line buffered disciplines etc. It is a simple buffer that outputs		/// rewinding, line buffered disciplines etc. It is a simple buffer that outputs
/// a chunk at a time.		/// a chunk at a time.
class raw_ostream {		class raw_ostream {
▲ Show 20 Lines • Show All 284 Lines • ▼ Show 20 Lines	class raw_fd_ostream : public raw_ostream {
bool Error;		bool Error;

/// Controls whether the stream should attempt to use atomic writes, when		/// Controls whether the stream should attempt to use atomic writes, when
/// possible.		/// possible.
bool UseAtomicWrites;		bool UseAtomicWrites;

uint64_t pos;		uint64_t pos;

		/// Controls the target encoding to convert this stream to, if necessary
		sys::fs::WindowsEncodingMethod Encoding;

		/// A buffer to avoid flushing incomplete UTF chars when working with a stream
		/// that converts encoding.
		char UTFBuf[4];
		char *UTFBufEnd;

/// write_impl - See raw_ostream::write_impl.		/// write_impl - See raw_ostream::write_impl.
void write_impl(const char *Ptr, size_t Size) override;		void write_impl(const char *Ptr, size_t Size) override;

/// current_pos - Return the current position within the stream, not		/// current_pos - Return the current position within the stream, not
/// counting the bytes currently in the buffer.		/// counting the bytes currently in the buffer.
uint64_t current_pos() const override { return pos; }		uint64_t current_pos() const override { return pos; }

/// preferred_buffer_size - Determine an efficient buffer size.		/// preferred_buffer_size - Determine an efficient buffer size.
Show All 12 Lines	public:
/// As a special case, if Filename is "-", then the stream will use		/// As a special case, if Filename is "-", then the stream will use
/// STDOUT_FILENO instead of opening a file. Note that it will still consider		/// STDOUT_FILENO instead of opening a file. Note that it will still consider
/// itself to own the file descriptor. In particular, it will close the		/// itself to own the file descriptor. In particular, it will close the
/// file descriptor when it is done (this is necessary to detect		/// file descriptor when it is done (this is necessary to detect
/// output errors).		/// output errors).
raw_fd_ostream(StringRef Filename, std::error_code &EC,		raw_fd_ostream(StringRef Filename, std::error_code &EC,
sys::fs::OpenFlags Flags);		sys::fs::OpenFlags Flags);

		/// This constructor variant adds the possibility to choose which encoding
		/// to use when writing a text file. On Windows, this is important when
		/// writing files with internationalization support with an encoding that is
		/// different from the one used in LLVM (UTF-8). We use this when writing
		/// response files, since GCC tools on MinGW only understand legacy code
		/// pages, and VisualStudio tools only understand UTF-16.
		/// For UNIX, using different encodings is silently ignored, since all tools
		/// work well with UTF-8.
		/// This mode assumes that you only use UTF-8 text data and will convert
		/// it to your desired encoding before writing to the file.
		///
		/// This variant does not accept the "-" special case for Filename.
		raw_fd_ostream(StringRef Filename, std::error_code &EC,
		sys::fs::OpenFlags Flags,
		sys::fs::WindowsEncodingMethod Encoding);


/// raw_fd_ostream ctor - FD is the file descriptor that this writes to. If		/// raw_fd_ostream ctor - FD is the file descriptor that this writes to. If
/// ShouldClose is true, this closes the file when the stream is destroyed.		/// ShouldClose is true, this closes the file when the stream is destroyed.
raw_fd_ostream(int fd, bool shouldClose, bool unbuffered=false);		raw_fd_ostream(int fd, bool shouldClose, bool unbuffered=false);

~raw_fd_ostream();		~raw_fd_ostream();

/// close - Manually flush the stream and close the file.		/// close - Manually flush the stream and close the file.
/// Note that this does not call fsync.		/// Note that this does not call fsync.
▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

lib/Support/Path.cpp

Show First 20 Lines • Show All 1,039 Lines • ▼ Show 20 Lines	std::error_code identify_magic(const Twine &Path, file_magic &Result) {
Result = identify_magic(StringRef(Buffer, Length));		Result = identify_magic(StringRef(Buffer, Length));
return std::error_code();		return std::error_code();
}		}

std::error_code directory_entry::status(file_status &result) const {		std::error_code directory_entry::status(file_status &result) const {
return fs::status(Path, result);		return fs::status(Path, result);
}		}

		std::error_code writeBufferToFile(int FD, const char *Buf, ssize_t Size,
		bool UseAtomicWrites) {
		do {
		ssize_t Ret;

		// Check whether we should attempt to use atomic writes.
		if (LLVM_LIKELY(!UseAtomicWrites)) {
		Ret = ::write(FD, Buf, Size);
		} else {
		// Use ::writev() where available.
		#if defined(HAVE_WRITEV)
		const void Addr = static_cast<const void >(Buf);
		struct iovec IOV = {const_cast<void *>(Addr), Size };
		Ret = ::writev(FD, &IOV, 1);
		#else
		Ret = ::write(FD, Buf, Size);
		#endif
		}

		if (Ret < 0) {
		// If it's a recoverable error, swallow it and retry the write.
		//
		// Ideally we wouldn't ever see EAGAIN or EWOULDBLOCK here, since
		// raw_ostream isn't designed to do non-blocking I/O. However, some
		// programs, such as old versions of bjam, have mistakenly used
		// O_NONBLOCK. For compatibility, emulate blocking semantics by
		// spinning until the write succeeds. If you don't want spinning,
		// don't use O_NONBLOCK file descriptors with raw_ostream.
		if (errno == EINTR \|\| errno == EAGAIN
		#ifdef EWOULDBLOCK
		\|\| errno == EWOULDBLOCK
		#endif
		)
		continue;

		// Otherwise it's a non-recoverable error.
		return std::make_error_code(std::errc::io_error);
		}

		// The write may have written some or all of the data. Update the
		// size and buffer pointer to reflect the remainder that needs
		// to be written. If there are no bytes left, we're done.
		Buf += Ret;
		Size -= Ret;
		} while (Size > 0);

		return std::error_code();
		}

} // end namespace fs		} // end namespace fs
} // end namespace sys		} // end namespace sys
} // end namespace llvm		} // end namespace llvm

// Include the truly platform-specific parts.		// Include the truly platform-specific parts.
#if defined(LLVM_ON_UNIX)		#if defined(LLVM_ON_UNIX)
#include "Unix/Path.inc"		#include "Unix/Path.inc"
#endif		#endif
#if defined(LLVM_ON_WIN32)		#if defined(LLVM_ON_WIN32)
#include "Windows/Path.inc"		#include "Windows/Path.inc"
#endif		#endif

lib/Support/Unix/Path.inc

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
namespace llvm {		namespace llvm {
namespace sys {		namespace sys {
namespace fs {		namespace fs {
#if defined(__FreeBSD__) \|\| defined (__NetBSD__) \|\| defined(__Bitrig__) \|\| \		#if defined(__FreeBSD__) \|\| defined (__NetBSD__) \|\| defined(__Bitrig__) \|\| \
defined(__OpenBSD__) \|\| defined(__minix) \|\| defined(__FreeBSD_kernel__) \|\| \		defined(__OpenBSD__) \|\| defined(__minix) \|\| defined(__FreeBSD_kernel__) \|\| \
defined(__linux__) \|\| defined(__CYGWIN__) \|\| defined(__DragonFly__)		defined(__linux__) \|\| defined(__CYGWIN__) \|\| defined(__DragonFly__)
static int		static int
test_dir(char ret[PATH_MAX], const char dir, const char bin)		test_dir(char ret[PATH_MAX], const char dir, const char bin)
{		{
struct stat sb;		struct stat sb;
char fullpath[PATH_MAX];		char fullpath[PATH_MAX];

snprintf(fullpath, PATH_MAX, "%s/%s", dir, bin);		snprintf(fullpath, PATH_MAX, "%s/%s", dir, bin);
if (realpath(fullpath, ret) == NULL)		if (realpath(fullpath, ret) == NULL)
return (1);		return (1);
if (stat(fullpath, &sb) != 0)		if (stat(fullpath, &sb) != 0)
return (1);		return (1);
▲ Show 20 Lines • Show All 523 Lines • ▼ Show 20 Lines	std::error_code openFileForWrite(const Twine &Name, int &ResultFD,
StringRef P = Name.toNullTerminatedStringRef(Storage);		StringRef P = Name.toNullTerminatedStringRef(Storage);
while ((ResultFD = open(P.begin(), OpenFlags, Mode)) < 0) {		while ((ResultFD = open(P.begin(), OpenFlags, Mode)) < 0) {
if (errno != EINTR)		if (errno != EINTR)
return std::error_code(errno, std::generic_category());		return std::error_code(errno, std::generic_category());
}		}
return std::error_code();		return std::error_code();
}		}

		std::error_code writeFileWithEncoding(
		int FD, StringRef Contents, WindowsEncodingMethod Encoding /ignored/,
		bool BeginOfFile /ignored/, bool UseAtomicWrites) {
		return writeBufferToFile(FD, Contents.data(), Contents.size(),
		UseAtomicWrites);
		}

} // end namespace fs		} // end namespace fs

namespace path {		namespace path {

bool home_directory(SmallVectorImpl<char> &result) {		bool home_directory(SmallVectorImpl<char> &result) {
if (char *RequestedDir = getenv("HOME")) {		if (char *RequestedDir = getenv("HOME")) {
result.clear();		result.clear();
result.append(RequestedDir, RequestedDir + strlen(RequestedDir));		result.append(RequestedDir, RequestedDir + strlen(RequestedDir));
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

lib/Support/Windows/Path.inc

Show All 11 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//=== WARNING: Implementation here must contain only generic Windows code that		//=== WARNING: Implementation here must contain only generic Windows code that
//=== is guaranteed to work on all Windows variants.		//=== is guaranteed to work on all Windows variants.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
		#include "llvm/Support/ConvertUTF.h"
#include "llvm/Support/WindowsError.h"		#include "llvm/Support/WindowsError.h"
#include <fcntl.h>		#include <fcntl.h>
#include <io.h>		#include <io.h>
#include <sys/stat.h>		#include <sys/stat.h>
#include <sys/types.h>		#include <sys/types.h>

// These two headers must be included last, and make sure shlobj is required		// These two headers must be included last, and make sure shlobj is required
// after Windows.h to make sure it picks up our definition of _WIN32_WINNT		// after Windows.h to make sure it picks up our definition of _WIN32_WINNT
▲ Show 20 Lines • Show All 796 Lines • ▼ Show 20 Lines	std::error_code openFileForWrite(const Twine &Name, int &ResultFD,
if (FD == -1) {		if (FD == -1) {
::CloseHandle(H);		::CloseHandle(H);
return windows_error(ERROR_INVALID_HANDLE);		return windows_error(ERROR_INVALID_HANDLE);
}		}

ResultFD = FD;		ResultFD = FD;
return std::error_code();		return std::error_code();
}		}

		std::error_code writeFileWithEncoding(int FD, StringRef Contents,
		WindowsEncodingMethod Encoding,
		bool BeginOfFile, bool UseAtomicWrites) {
		if (Encoding == WEM_UTF8) {
		return writeBufferToFile(FD, Contents.data(), Contents.size(),
		UseAtomicWrites);
		} else if (Encoding == WEM_CurrentCodePage) {
		SmallVector<wchar_t, 1> ArgsUTF16;
		SmallVector<char, 1> ArgsCurCP;

		if (std::error_code EC = windows::UTF8ToUTF16(Contents, ArgsUTF16))
		return EC;

		if (std::error_code EC =
		windows::UTF16ToCurCP(ArgsUTF16.data(), ArgsUTF16.size(), ArgsCurCP))
		return EC;

		return writeBufferToFile(FD, ArgsCurCP.data(), ArgsCurCP.size(),
		UseAtomicWrites);
		} else if (Encoding == WEM_UTF16) {
		SmallVector<wchar_t, 1> ArgsUTF16;

		if (std::error_code EC = windows::UTF8ToUTF16(Contents, ArgsUTF16))
		return EC;

		if (BeginOfFile) {
		// Endianness guessing - Write BOM in the first write to this file.
		char BOM[2];
		uint16_t src = UNI_UTF16_BYTE_ORDER_MARK_NATIVE;
		memcpy(BOM, &src, 2);
		if (std::error_code EC = writeBufferToFile(FD, BOM, 2, UseAtomicWrites))
		return EC;
		}

		return writeBufferToFile(FD, (char *)ArgsUTF16.data(), ArgsUTF16.size() << 1,
		UseAtomicWrites);
		}

		llvm_unreachable("Unknown encoding");
		return std::make_error_code(std::errc::io_error);
		}
} // end namespace fs		} // end namespace fs

namespace path {		namespace path {

bool home_directory(SmallVectorImpl<char> &result) {		bool home_directory(SmallVectorImpl<char> &result) {
wchar_t Path[MAX_PATH];		wchar_t Path[MAX_PATH];
if (::SHGetFolderPathW(0, CSIDL_APPDATA \| CSIDL_FLAG_CREATE, 0,		if (::SHGetFolderPathW(0, CSIDL_APPDATA \| CSIDL_FLAG_CREATE, 0,
/SHGFP_TYPE_CURRENT/0, Path) != S_OK)		/SHGFP_TYPE_CURRENT/0, Path) != S_OK)
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	std::error_code UTF8ToUTF16(llvm::StringRef utf8,

// Make utf16 null terminated.		// Make utf16 null terminated.
utf16.push_back(0);		utf16.push_back(0);
utf16.pop_back();		utf16.pop_back();

return std::error_code();		return std::error_code();
}		}

std::error_code UTF16ToUTF8(const wchar_t *utf16, size_t utf16_len,		static
		std::error_code UTF16ToCodePage(unsigned codepage, const wchar_t *utf16,
		size_t utf16_len,
llvm::SmallVectorImpl<char> &utf8) {		llvm::SmallVectorImpl<char> &utf8) {
if (utf16_len) {		if (utf16_len) {
// Get length.		// Get length.
int len = ::WideCharToMultiByte(CP_UTF8, 0, utf16, utf16_len, utf8.begin(),		int len = ::WideCharToMultiByte(codepage, 0, utf16, utf16_len, utf8.begin(),
0, NULL, NULL);		0, NULL, NULL);

if (len == 0)		if (len == 0)
return windows_error(::GetLastError());		return windows_error(::GetLastError());

utf8.reserve(len);		utf8.reserve(len);
utf8.set_size(len);		utf8.set_size(len);

// Now do the actual conversion.		// Now do the actual conversion.
len = ::WideCharToMultiByte(CP_UTF8, 0, utf16, utf16_len, utf8.data(),		len = ::WideCharToMultiByte(codepage, 0, utf16, utf16_len, utf8.data(),
utf8.size(), NULL, NULL);		utf8.size(), NULL, NULL);

if (len == 0)		if (len == 0)
return windows_error(::GetLastError());		return windows_error(::GetLastError());
}		}

// Make utf8 null terminated.		// Make utf8 null terminated.
utf8.push_back(0);		utf8.push_back(0);
utf8.pop_back();		utf8.pop_back();

return std::error_code();		return std::error_code();
}		}

		std::error_code UTF16ToUTF8(const wchar_t *utf16, size_t utf16_len,
		llvm::SmallVectorImpl<char> &utf8) {
		return UTF16ToCodePage(CP_UTF8, utf16, utf16_len, utf8);
		}

		std::error_code UTF16ToCurCP(const wchar_t *utf16, size_t utf16_len,
		llvm::SmallVectorImpl<char> &utf8) {
		return UTF16ToCodePage(CP_ACP, utf16, utf16_len, utf8);
		}
} // end namespace windows		} // end namespace windows
} // end namespace sys		} // end namespace sys
} // end namespace llvm		} // end namespace llvm

lib/Support/Windows/WindowsSupport.h

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	c_str(SmallVectorImpl<T> &str) {
return str.data();		return str.data();
}		}

namespace sys {		namespace sys {
namespace windows {		namespace windows {
std::error_code UTF8ToUTF16(StringRef utf8, SmallVectorImpl<wchar_t> &utf16);		std::error_code UTF8ToUTF16(StringRef utf8, SmallVectorImpl<wchar_t> &utf16);
std::error_code UTF16ToUTF8(const wchar_t *utf16, size_t utf16_len,		std::error_code UTF16ToUTF8(const wchar_t *utf16, size_t utf16_len,
SmallVectorImpl<char> &utf8);		SmallVectorImpl<char> &utf8);
		/// Convert from UTF16 to the current code page used in the system
		std::error_code UTF16ToCurCP(const wchar_t *utf16, size_t utf16_len,
		SmallVectorImpl<char> &utf8);
} // end namespace windows		} // end namespace windows
} // end namespace sys		} // end namespace sys
} // end namespace llvm.		} // end namespace llvm.

lib/Support/raw_ostream.cpp

Show All 11 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/Config/config.h"		#include "llvm/Config/config.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
		#include "llvm/Support/ConvertUTF.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"
#include "llvm/Support/Process.h"		#include "llvm/Support/Process.h"
#include "llvm/Support/Program.h"		#include "llvm/Support/Program.h"
#include <cctype>		#include <cctype>
#include <cerrno>		#include <cerrno>
#include <sys/stat.h>		#include <sys/stat.h>
▲ Show 20 Lines • Show All 395 Lines • ▼ Show 20 Lines
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// raw_fd_ostream		// raw_fd_ostream
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

raw_fd_ostream::raw_fd_ostream(StringRef Filename, std::error_code &EC,		raw_fd_ostream::raw_fd_ostream(StringRef Filename, std::error_code &EC,
sys::fs::OpenFlags Flags)		sys::fs::OpenFlags Flags)
: Error(false), UseAtomicWrites(false), pos(0) {		: Error(false), UseAtomicWrites(false), pos(0), Encoding(sys::fs::WEM_UTF8),
		UTFBufEnd(UTFBuf) {
EC = std::error_code();		EC = std::error_code();
// Handle "-" as stdout. Note that when we do this, we consider ourself		// Handle "-" as stdout. Note that when we do this, we consider ourself
// the owner of stdout. This means that we can do things like close the		// the owner of stdout. This means that we can do things like close the
// file descriptor when we're done and set the "binary" flag globally.		// file descriptor when we're done and set the "binary" flag globally.
if (Filename == "-") {		if (Filename == "-") {
FD = STDOUT_FILENO;		FD = STDOUT_FILENO;
// If user requested binary then put stdout into binary mode if		// If user requested binary then put stdout into binary mode if
// possible.		// possible.
Show All 10 Lines	if (EC) {
ShouldClose = false;		ShouldClose = false;
return;		return;
}		}

// Ok, we successfully opened the file, so it'll need to be closed.		// Ok, we successfully opened the file, so it'll need to be closed.
ShouldClose = true;		ShouldClose = true;
}		}

		raw_fd_ostream::raw_fd_ostream(StringRef Filename, std::error_code &EC,
		sys::fs::OpenFlags Flags,
		sys::fs::WindowsEncodingMethod _Encoding)
		: Error(false), UseAtomicWrites(false), pos(0), Encoding(_Encoding),
		UTFBufEnd(UTFBuf) {
		EC = sys::fs::openFileForWrite(Filename, FD, Flags);

		if (EC) {
		ShouldClose = false;
		return;
		}

		// Ok, we successfully opened the file, so it'll need to be closed.
		ShouldClose = true;
		}

/// raw_fd_ostream ctor - FD is the file descriptor that this writes to. If		/// raw_fd_ostream ctor - FD is the file descriptor that this writes to. If
/// ShouldClose is true, this closes the file when the stream is destroyed.		/// ShouldClose is true, this closes the file when the stream is destroyed.
raw_fd_ostream::raw_fd_ostream(int fd, bool shouldClose, bool unbuffered)		raw_fd_ostream::raw_fd_ostream(int fd, bool shouldClose, bool unbuffered)
: raw_ostream(unbuffered), FD(fd),		: raw_ostream(unbuffered), FD(fd),
ShouldClose(shouldClose), Error(false), UseAtomicWrites(false) {		ShouldClose(shouldClose), Error(false), UseAtomicWrites(false),
		Encoding(sys::fs::WEM_UTF8), UTFBufEnd(UTFBuf) {
#ifdef O_BINARY		#ifdef O_BINARY
// Setting STDOUT to binary mode is necessary in Win32		// Setting STDOUT to binary mode is necessary in Win32
// to avoid undesirable linefeed conversion.		// to avoid undesirable linefeed conversion.
// Don't touch STDERR, or w*printf() (in assert()) would barf wide chars.		// Don't touch STDERR, or w*printf() (in assert()) would barf wide chars.
if (fd == STDOUT_FILENO)		if (fd == STDOUT_FILENO)
setmode(fd, O_BINARY);		setmode(fd, O_BINARY);
#endif		#endif

// Get the starting position.		// Get the starting position.
off_t loc = ::lseek(FD, 0, SEEK_CUR);		off_t loc = ::lseek(FD, 0, SEEK_CUR);
if (loc == (off_t)-1)		if (loc == (off_t)-1)
pos = 0;		pos = 0;
else		else
pos = static_cast<uint64_t>(loc);		pos = static_cast<uint64_t>(loc);
}		}

raw_fd_ostream::~raw_fd_ostream() {		raw_fd_ostream::~raw_fd_ostream() {
		assert(UTFBuf == UTFBufEnd && "Should not hold incomplete UTF chars");
if (FD >= 0) {		if (FD >= 0) {
flush();		flush();
if (ShouldClose)		if (ShouldClose)
while (::close(FD) != 0)		while (::close(FD) != 0)
if (errno != EINTR) {		if (errno != EINTR) {
error_detected();		error_detected();
break;		break;
}		}
Show All 10 Lines	#endif
// If there are any pending errors, report them now. Clients wishing		// If there are any pending errors, report them now. Clients wishing
// to avoid report_fatal_error calls should check for errors with		// to avoid report_fatal_error calls should check for errors with
// has_error() and clear the error flag with clear_error() before		// has_error() and clear the error flag with clear_error() before
// destructing raw_ostream objects which may have errors.		// destructing raw_ostream objects which may have errors.
if (has_error())		if (has_error())
report_fatal_error("IO failure on output stream.", /GenCrashDiag=/false);		report_fatal_error("IO failure on output stream.", /GenCrashDiag=/false);
}		}

		/// Analyzes the last 3 bytes of the buffer <Ptr, Size>.
		/// If it detects an incomplete UTF char at the end of the buffer, it copies
		/// them to buffer Out and updates OutEnd (which points to the end of the out
		/// buffer).
		/// Returns the number of bytes copied.
		static unsigned getIncompleteUTFBytes(const char *Ptr, size_t Size,
		char Out, char &OutEnd) {
		char Last3[3] = {'\0', '\0', '\0'};

		// Copy last 3 bytes of the buffer, the largest possible incomplete UTF char
		int Idx = 2;
		const char *Cur = Ptr + Size - 1;
		while ((Idx >= 0) && (Cur >= Ptr))
		Last3[Idx--] = *Cur--;

		// Are the last bytes of the buffer the begin of a UTF multibyte char?
		if (getNumBytesForUTF8(Last3[2]) > 1) {
		Out[0] = Last3[2];
		OutEnd = &Out[1];
		return 1;
		}
		if (getNumBytesForUTF8(Last3[1]) > 2) {
		Out[0] = Last3[1];
		Out[1] = Last3[2];
		OutEnd = &Out[2];
		return 2;
		}
		if (getNumBytesForUTF8(Last3[0]) > 3) {
		Out[0] = Last3[0];
		Out[1] = Last3[1];
		Out[2] = Last3[2];
		OutEnd = &Out[3];
		return 3;
		}
		OutEnd = Out;
		return 0;
		}

void raw_fd_ostream::write_impl(const char *Ptr, size_t Size) {		void raw_fd_ostream::write_impl(const char *Ptr, size_t Size) {
assert(FD >= 0 && "File already closed.");		assert(FD >= 0 && "File already closed.");
pos += Size;

do {		// If user requested a different encoding, we must manage incomplete UTF8
ssize_t ret;		// chars at the end of the buffer.
		if (Encoding != sys::fs::WEM_UTF8) {
		std::vector<char> NewBuf;
		const size_t SizeIncomplete = (size_t) (UTFBufEnd - UTFBuf);
		size_t NewSize = Size + SizeIncomplete;

		// Add incomplete UTF char from previous flush
		if (SizeIncomplete > 0) {
		NewBuf.resize(NewSize);
		NewBuf.insert(NewBuf.begin(), UTFBuf, UTFBufEnd);
		NewBuf.insert(NewBuf.begin() + SizeIncomplete, Ptr, Ptr + Size);
		Ptr = NewBuf.data();
		}

		// Detect incomplete UTF char at the end of the buffer and defer it to
		// the next flush
		if (unsigned Num = getIncompleteUTFBytes(Ptr, NewSize, UTFBuf, UTFBufEnd))
		NewSize -= Num;

		if ((sys::fs::writeFileWithEncoding(
		FD, StringRef(Ptr, NewSize), Encoding, /BeginOfFile=/pos == 0,
		UseAtomicWrites)))
		error_detected();

// Check whether we should attempt to use atomic writes.		pos += Size;
if (LLVM_LIKELY(!UseAtomicWrites)) {		return;
ret = ::write(FD, Ptr, Size);
} else {
// Use ::writev() where available.
#if defined(HAVE_WRITEV)
const void Addr = static_cast<const void >(Ptr);
struct iovec IOV = {const_cast<void *>(Addr), Size };
ret = ::writev(FD, &IOV, 1);
#else
ret = ::write(FD, Ptr, Size);
#endif
}		}

if (ret < 0) {		pos += Size;
// If it's a recoverable error, swallow it and retry the write.
//
// Ideally we wouldn't ever see EAGAIN or EWOULDBLOCK here, since
// raw_ostream isn't designed to do non-blocking I/O. However, some
// programs, such as old versions of bjam, have mistakenly used
// O_NONBLOCK. For compatibility, emulate blocking semantics by
// spinning until the write succeeds. If you don't want spinning,
// don't use O_NONBLOCK file descriptors with raw_ostream.
if (errno == EINTR \|\| errno == EAGAIN
#ifdef EWOULDBLOCK
\|\| errno == EWOULDBLOCK
#endif
)
continue;

// Otherwise it's a non-recoverable error. Note it and quit.		if ((sys::fs::writeBufferToFile(FD, Ptr, Size, UseAtomicWrites)))
error_detected();		error_detected();
break;
}

// The write may have written some or all of the data. Update the
// size and buffer pointer to reflect the remainder that needs
// to be written. If there are no bytes left, we're done.
Ptr += ret;
Size -= ret;
} while (Size > 0);
}		}

void raw_fd_ostream::close() {		void raw_fd_ostream::close() {
assert(ShouldClose);		assert(ShouldClose);
ShouldClose = false;		ShouldClose = false;
flush();		flush();
while (::close(FD) != 0)		while (::close(FD) != 0)
if (errno != EINTR) {		if (errno != EINTR) {
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

unittests/Support/Path.cpp

Show First 20 Lines • Show All 618 Lines • ▼ Show 20 Lines	fs::mapped_file_region m(Twine(TempPath),
0,		0,
EC);		EC);
ASSERT_NO_ERROR(EC);		ASSERT_NO_ERROR(EC);
const char *Data = m.const_data();		const char *Data = m.const_data();
fs::mapped_file_region mfrrv(std::move(m));		fs::mapped_file_region mfrrv(std::move(m));
EXPECT_EQ(mfrrv.const_data(), Data);		EXPECT_EQ(mfrrv.const_data(), Data);
}		}

		#ifdef LLVM_ON_WIN32
		const char UTF16LE_Text[] =
		"\x6c\x00\x69\x00\x6e\x00\x67\x00\xfc\x00\x69\x00\xe7\x00\x61\x00";
		const char UTF16BE_Text[] =
		"\x00\x6c\x00\x69\x00\x6e\x00\x67\x00\xfc\x00\x69\x00\xe7\x00\x61";
		#endif
		const char UTF8Text[] = "\x6c\x69\x6e\x67\xc3\xbc\x69\xc3\xa7\x61";

		TEST_F(FileSystemTest, TestWriteFileWithEncoding) {
		// Create a temp file for writing
		int FileDescriptor1 = 0;
		SmallString<128> FilePathname(TestDirectory);

		path::append(FilePathname, "international-file.txt");
		ASSERT_NO_ERROR(fs::openFileForWrite(FilePathname.c_str(), FileDescriptor1,
		fs::OpenFlags::F_Text));

		// Only on Windows we should encode in UTF16. For other systems, use UTF8
		ASSERT_NO_ERROR(sys::fs::writeFileWithEncoding(FileDescriptor1, UTF8Text,
		sys::fs::WEM_UTF16, true,
		false));

		::close(FileDescriptor1);

		// Now open the file for reading and confirm the encoding
		int FileDescriptor2 = 0;
		ASSERT_NO_ERROR(fs::openFileForRead(FilePathname.c_str(), FileDescriptor2));

		// On Windows, test for UTF16 variants
		#if defined(LLVM_ON_WIN32)
		char Buf[18];
		ASSERT_EQ(::read(FileDescriptor2, Buf, 18), 18);
		if (strncmp(Buf, "\xfe\xff", 2) == 0) { // UTF16-BE
		ASSERT_EQ(strncmp(&Buf[2], UTF16BE_Text, 16), 0);
		} else if (strncmp(Buf, "\xff\xfe", 2) == 0) { // UTF16-LE
		ASSERT_EQ(strncmp(&Buf[2], UTF16LE_Text, 16), 0);
		} else {
		FAIL() << "Invalid BOM in UTF-16 file";
		}
		#else
		// On UNIX, test for UTF8
		char Buf[10];
		ASSERT_EQ(::read(FileDescriptor2, Buf, 10), 10);
		ASSERT_EQ(strncmp(Buf, UTF8Text, 10), 0);
		#endif

		// Now close the file and delete it
		::close(FileDescriptor2);
		ASSERT_NO_ERROR(fs::remove(FilePathname.str()));
		}

TEST(Support, NormalizePath) {		TEST(Support, NormalizePath) {
#if defined(LLVM_ON_WIN32)		#if defined(LLVM_ON_WIN32)
#define EXPECT_PATH_IS(path__, windows__, not_windows__) \		#define EXPECT_PATH_IS(path__, windows__, not_windows__) \
EXPECT_EQ(path__, windows__);		EXPECT_EQ(path__, windows__);
#else		#else
#define EXPECT_PATH_IS(path__, windows__, not_windows__) \		#define EXPECT_PATH_IS(path__, windows__, not_windows__) \
EXPECT_EQ(path__, not_windows__);		EXPECT_EQ(path__, not_windows__);
#endif		#endif
Show All 29 Lines

unittests/Support/raw_ostream_test.cpp

	//===- llvm/unittest/Support/raw_ostream_test.cpp - raw_ostream tests -----===//			//===- llvm/unittest/Support/raw_ostream_test.cpp - raw_ostream tests -----===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "gtest/gtest.h"			#include "gtest/gtest.h"
	#include "llvm/ADT/SmallString.h"			#include "llvm/ADT/SmallString.h"
				#include "llvm/Support/FileSystem.h"
	#include "llvm/Support/Format.h"			#include "llvm/Support/Format.h"
				#include "llvm/Support/Path.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"

	using namespace llvm;			using namespace llvm;

				#define ASSERT_NO_ERROR(x) \
				if (std::error_code ASSERT_NO_ERROR_ec = x) { \
				SmallString<128> MessageStorage; \
				raw_svector_ostream Message(MessageStorage); \
				Message << #x ": did not return errc::success.\n" \
				<< "error number: " << ASSERT_NO_ERROR_ec.value() << "\n" \
				<< "error message: " << ASSERT_NO_ERROR_ec.message() << "\n"; \
				GTEST_FATAL_FAILURE_(MessageStorage.c_str()); \
				} else { \
				}


	namespace {			namespace {

	template<typename T> std::string printToString(const T &Value) {			template<typename T> std::string printToString(const T &Value) {
	std::string res;			std::string res;
	llvm::raw_string_ostream(res) << Value;			llvm::raw_string_ostream(res) << Value;
	return res;			return res;
	}			}

	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	TEST(raw_ostreamTest, BufferEdge) {			TEST(raw_ostreamTest, BufferEdge) {
	EXPECT_EQ("1.20", printToString(format("%.2f", 1.2), 1));			EXPECT_EQ("1.20", printToString(format("%.2f", 1.2), 1));
	EXPECT_EQ("1.20", printToString(format("%.2f", 1.2), 2));			EXPECT_EQ("1.20", printToString(format("%.2f", 1.2), 2));
	EXPECT_EQ("1.20", printToString(format("%.2f", 1.2), 3));			EXPECT_EQ("1.20", printToString(format("%.2f", 1.2), 3));
	EXPECT_EQ("1.20", printToString(format("%.2f", 1.2), 4));			EXPECT_EQ("1.20", printToString(format("%.2f", 1.2), 4));
	EXPECT_EQ("1.20", printToString(format("%.2f", 1.2), 10));			EXPECT_EQ("1.20", printToString(format("%.2f", 1.2), 10));
	}			}

				#ifdef LLVM_ON_WIN32
				const char UTF16LE_Text[] =
				"\x6c\x00\x69\x00\x6e\x00\x67\x00\xfc\x00\x69\x00\xe7\x00\x61\x00";
				const char UTF16BE_Text[] =
				"\x00\x6c\x00\x69\x00\x6e\x00\x67\x00\xfc\x00\x69\x00\xe7\x00\x61";
				#endif
				const char UTF8Text[] = "\x6c\x69\x6e\x67\xc3\xbc\x69\xc3\xa7\x61";

				TEST(raw_ostreamTest, Encodedraw_fd_ostream) {
				SmallString<128> TestDirectory;
				ASSERT_NO_ERROR(llvm::sys::fs::createUniqueDirectory("raw_fd_ostream-test",
				TestDirectory));
				errs() << "Test Directory: " << TestDirectory << '\n';
				errs().flush();
				SmallString<128> FilePathname(TestDirectory);
				llvm::sys::path::append(FilePathname, "international-file.txt");
				// Only on Windows this should encode in UTF16. For other systems, it should
				// ignore our request and encode in UTF8
				std::error_code EC;
				raw_fd_ostream OS(FilePathname, EC, sys::fs::OpenFlags::F_Text,
				sys::fs::WEM_UTF16);
				ASSERT_FALSE(EC);
				OS << "\x6c\x69\x6e\x67\xc3"; // interrupt the UTF8 char
				OS.flush();
				OS << "\xbc\x69\xc3\xa7\x61"; // resume
				OS.close();
				ASSERT_FALSE(OS.has_error());
				int FileDescriptor = 0;
				ASSERT_NO_ERROR(llvm::sys::fs::openFileForRead(FilePathname.c_str(),
				FileDescriptor));
				#if defined(LLVM_ON_WIN32)
				char Buf[18];
				ASSERT_EQ(::read(FileDescriptor, Buf, 18), 18);
				if (strncmp(Buf, "\xfe\xff", 2) == 0) { // UTF16-BE
				ASSERT_EQ(strncmp(&Buf[2], UTF16BE_Text, 16), 0);
				} else if (strncmp(Buf, "\xff\xfe", 2) == 0) { // UTF16-LE
				ASSERT_EQ(strncmp(&Buf[2], UTF16LE_Text, 16), 0);
				} else {
				FAIL() << "Invalid BOM in UTF-16 file";
				}
				#else
				char Buf[10];
				ASSERT_EQ(::read(FileDescriptor, Buf, 10), 10);
				ASSERT_EQ(strncmp(Buf, UTF8Text, 10), 0);
				#endif
				::close(FileDescriptor);
				ASSERT_NO_ERROR(llvm::sys::fs::remove(FilePathname.str()));
				ASSERT_NO_ERROR(llvm::sys::fs::remove(TestDirectory.str()));
				}

	TEST(raw_ostreamTest, TinyBuffer) {			TEST(raw_ostreamTest, TinyBuffer) {
	std::string Str;			std::string Str;
	raw_string_ostream OS(Str);			raw_string_ostream OS(Str);
	OS.SetBufferSize(1);			OS.SetBufferSize(1);
	OS << "hello";			OS << "hello";
	OS << 1;			OS << 1;
	OS << 'w' << 'o' << 'r' << 'l' << 'd';			OS << 'w' << 'o' << 'r' << 'l' << 'd';
	EXPECT_EQ("hello1world", OS.str());			EXPECT_EQ("hello1world", OS.str());
	Show All 19 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add writeFileWithSystemEncoding to LibLLVMSupportClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 13115

include/llvm/Support/FileSystem.h

include/llvm/Support/raw_ostream.h

lib/Support/Path.cpp

lib/Support/Unix/Path.inc

lib/Support/Windows/Path.inc

lib/Support/Windows/WindowsSupport.h

lib/Support/raw_ostream.cpp

unittests/Support/Path.cpp

unittests/Support/raw_ostream_test.cpp

Add writeFileWithSystemEncoding to LibLLVMSupport
ClosedPublic