This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Support/
-
llvm/
-
Support/
-
FileOutputBuffer.h
-
lib/Support/
-
Support/
9
FileOutputBuffer.cpp

Differential D39449

Rewrite FileOutputBuffer as two separate classes.
ClosedPublic

Authored by ruiu on Oct 30 2017, 8:01 PM.

Download Raw Diff

Details

Reviewers

• rafael
zturner

Commits

rGa16fe65b7245: Rewrite FileOutputBuffer as two separate classes.
rL317127: Rewrite FileOutputBuffer as two separate classes.

Summary

This patch is to rewrite FileOutputBuffer as two separate classes;
one for file-backed output buffer and the other for memory-backed
output buffer. I think the new code is easier to follow because two
different implementations are now actually separated as different
classes.

Unlike the previous implementation, the class that does not replace the
final output file using rename(2) does not create a temporary file at
all. Instead, it allocates memory using mmap(2) and use it. I think
this is an improvement because it is now guaranteed that the temporary
memory region doesn't trigger any I/O and there's now zero chance to
leave a temporary file behind. Also, it shouldn't impose new restrictions
because were using mmap IO too.

Diff Detail

Build Status

Buildable 11647
Build 11647: arc lint + arc unit

Event Timeline

ruiu created this revision.Oct 30 2017, 8:01 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptOct 30 2017, 8:01 PM

zturner added a subscriber: zturner.Oct 30 2017, 8:21 PM

zturner added inline comments.

llvm/lib/Support/FileOutputBuffer.cpp
84–85	Why can't we just use `new` here?
145	This is a bit confusing, in the sense that the function is called `isFileRegular` but then it can return true even if `is_regular_file` returns false.
170	It seems like we can just inline this. fs::file_status Stat; if (auto EC = fs::status(Path, Stat)) return EC; if (fs::is_directory(Stat)) return errc::is_a_directory; if (!fs::exists(Stat) \|\| fs::is_regular_file(Stat)) return OnDiskBuffer::create(Path, Size, Mode); return InMemoryBuffer::create(Path, Size, Mode); the extra function here just confuses things IMO. That said, if it reaches the `InMemoryBuffer` code path, then what exactly is `Path`? It definitely exists, but it's neither a file nor a directory. So it's probably either a symlink, block file, character file, fifo, socket, or unknown file. Why would we be want to overwrite one of those with something?

ruiu added inline comments.Oct 31 2017, 9:24 AM

llvm/lib/Support/FileOutputBuffer.cpp
84–85	I didn't design `Memory` class, but it looks like that class can be instantiated only via allocateMappedMemory.
170	Thank you for the suggestion. Your code looks indeed better. Updated the patch. Also added comment to answer your question.

Address Zach's comments

Harbormaster completed remote builds in B11657: Diff 120995.Oct 31 2017, 9:24 AM

Fix typo

Harbormaster completed remote builds in B11667: Diff 121016.Oct 31 2017, 10:42 AM

zturner added inline comments.Oct 31 2017, 11:12 AM

llvm/lib/Support/FileOutputBuffer.cpp
62–119	Maybe want to add `override` to this and the other class's destructor
84–85	Right, but why do we even need to use the `Memory` class at all? It seems like overkill. The comments say: /// This method allocates a block of memory that is suitable for loading /// dynamically generated code (e.g. JIT). Since we don't care about the protection (RWX) of the temporary storage, and only about the protection of the final file on disk, it seems like this is unnecessary. Instead of storing `OwningMemoryBlock` as a member of the class, you could just store `std::vector<uint8_t>`. There are fewer failure paths this way, as it's literally just one call to `new`, whereas `allocateMappedMemory` has several ways it can return errors.

ruiu added inline comments.Oct 31 2017, 11:13 AM

llvm/lib/Support/FileOutputBuffer.cpp
84–85	How can you catch memory exhaustion error if you use `new`?

zturner accepted this revision.Oct 31 2017, 11:28 AM

zturner added inline comments.

llvm/lib/Support/FileOutputBuffer.cpp
84–85	Hmm, yea. I guess that's a problem especially since file can be really large. Honestly I've always had a strong dislike for how in LLVM we write to files by mapping the entire contents into memory and then writing to it. There was actually a bug just yesterday because of this exact reason, where someone was trying to write a TAR file > 4GB, and so a bunch of hacks have to be added to lit to deal with the address space of the process. If something is a file, you should just be able to write to it. Especially in this case, where "writing directly to the underlying file" (as opposed to writing to a separate file first and then renaming) is actually the desired behavior. Anyway, I guess for now since we need to handle OOM error this is fine.

This revision is now accepted and ready to land.Oct 31 2017, 11:28 AM

Closed by commit rL317127: Rewrite FileOutputBuffer as two separate classes. (authored by ruiu). · Explain WhyNov 1 2017, 2:38 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

include/

llvm/

Support/

FileOutputBuffer.h

34 lines

lib/

Support/

FileOutputBuffer.cpp

203 lines

Diff 120931

llvm/include/llvm/Support/FileOutputBuffer.h

	Show All 24 Lines
	/// buffer which will be written to a file. During the lifetime of these			/// buffer which will be written to a file. During the lifetime of these
	/// objects, the content or existence of the specified file is undefined. That			/// objects, the content or existence of the specified file is undefined. That
	/// is, creating an OutputBuffer for a file may immediately remove the file.			/// is, creating an OutputBuffer for a file may immediately remove the file.
	/// If the FileOutputBuffer is committed, the target file's content will become			/// If the FileOutputBuffer is committed, the target file's content will become
	/// the buffer content at the time of the commit. If the FileOutputBuffer is			/// the buffer content at the time of the commit. If the FileOutputBuffer is
	/// not committed, the file will be deleted in the FileOutputBuffer destructor.			/// not committed, the file will be deleted in the FileOutputBuffer destructor.
	class FileOutputBuffer {			class FileOutputBuffer {
	public:			public:

	enum {			enum {
	F_executable = 1 /// set the 'x' bit on the resulting file			F_executable = 1 /// set the 'x' bit on the resulting file
	};			};

	/// Factory method to create an OutputBuffer object which manages a read/write			/// Factory method to create an OutputBuffer object which manages a read/write
	/// buffer of the specified size. When committed, the buffer will be written			/// buffer of the specified size. When committed, the buffer will be written
	/// to the file at the specified path.			/// to the file at the specified path.
	static ErrorOr<std::unique_ptr<FileOutputBuffer>>			static ErrorOr<std::unique_ptr<FileOutputBuffer>>
	create(StringRef FilePath, size_t Size, unsigned Flags = 0);			create(StringRef FilePath, size_t Size, unsigned Flags = 0);

	/// Returns a pointer to the start of the buffer.			/// Returns a pointer to the start of the buffer.
	uint8_t *getBufferStart() {			virtual uint8_t *getBufferStart() const = 0;
	return (uint8_t*)Region->data();
	}

	/// Returns a pointer to the end of the buffer.			/// Returns a pointer to the end of the buffer.
	uint8_t *getBufferEnd() {			virtual uint8_t *getBufferEnd() const = 0;
	return (uint8_t*)Region->data() + Region->size();
	}

	/// Returns size of the buffer.			/// Returns size of the buffer.
	size_t getBufferSize() const {			virtual size_t getBufferSize() const = 0;
	return Region->size();
	}

	/// Returns path where file will show up if buffer is committed.			/// Returns path where file will show up if buffer is committed.
	StringRef getPath() const {			StringRef getPath() const { return FinalPath; }
	return FinalPath;
	}

	/// Flushes the content of the buffer to its file and deallocates the			/// Flushes the content of the buffer to its file and deallocates the
	/// buffer. If commit() is not called before this object's destructor			/// buffer. If commit() is not called before this object's destructor
	/// is called, the file is deleted in the destructor. The optional parameter			/// is called, the file is deleted in the destructor. The optional parameter
	/// is used if it turns out you want the file size to be smaller than			/// is used if it turns out you want the file size to be smaller than
	/// initially requested.			/// initially requested.
	std::error_code commit();			virtual std::error_code commit() = 0;

	/// If this object was previously committed, the destructor just deletes			/// If this object was previously committed, the destructor just deletes
	/// this object. If this object was not committed, the destructor			/// this object. If this object was not committed, the destructor
	/// deallocates the buffer and the target file is never written.			/// deallocates the buffer and the target file is never written.
	~FileOutputBuffer();			virtual ~FileOutputBuffer() {}

				protected:
				FileOutputBuffer(StringRef Path) : FinalPath(Path) {}

	private:			std::string FinalPath;
	FileOutputBuffer(const FileOutputBuffer &) = delete;
	FileOutputBuffer &operator=(const FileOutputBuffer &) = delete;

	FileOutputBuffer(std::unique_ptr<llvm::sys::fs::mapped_file_region> R,
	StringRef Path, StringRef TempPath, bool IsRegular);

	std::unique_ptr<llvm::sys::fs::mapped_file_region> Region;
	SmallString<128> FinalPath;
	SmallString<128> TempPath;
	bool IsRegular;
	};			};
	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

llvm/lib/Support/FileOutputBuffer.cpp

	Show All 9 Lines
	// Utility for creating a in-memory buffer that will be written to a file.			// Utility for creating a in-memory buffer that will be written to a file.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "llvm/Support/FileOutputBuffer.h"			#include "llvm/Support/FileOutputBuffer.h"
	#include "llvm/ADT/STLExtras.h"			#include "llvm/ADT/STLExtras.h"
	#include "llvm/ADT/SmallString.h"			#include "llvm/ADT/SmallString.h"
	#include "llvm/Support/Errc.h"			#include "llvm/Support/Errc.h"
				#include "llvm/Support/Memory.h"
	#include "llvm/Support/Path.h"			#include "llvm/Support/Path.h"
	#include "llvm/Support/Signals.h"			#include "llvm/Support/Signals.h"
	#include <system_error>			#include <system_error>

	#if !defined(_MSC_VER) && !defined(__MINGW32__)			#if !defined(_MSC_VER) && !defined(__MINGW32__)
	#include <unistd.h>			#include <unistd.h>
	#else			#else
	#include <io.h>			#include <io.h>
	#endif			#endif

	using llvm::sys::fs::mapped_file_region;			using namespace llvm;
				using namespace llvm::sys;

	namespace llvm {			// A FileOutputBuffer which creates a temporary file in the same directory
	FileOutputBuffer::FileOutputBuffer(std::unique_ptr<mapped_file_region> R,			// as the final output file. The final output file is atomically replaced
	StringRef Path, StringRef TmpPath,			// with the temporary file on commit().
	bool IsRegular)			class OnDiskBuffer : public FileOutputBuffer {
	: Region(std::move(R)), FinalPath(Path), TempPath(TmpPath),			public:
	IsRegular(IsRegular) {}			OnDiskBuffer(StringRef Path, StringRef TempPath,
				std::unique_ptr<fs::mapped_file_region> Buf)
				: FileOutputBuffer(Path), Buffer(std::move(Buf)), TempPath(TempPath) {}

	FileOutputBuffer::~FileOutputBuffer() {			static ErrorOr<std::unique_ptr<OnDiskBuffer>>
				create(StringRef Path, size_t Size, unsigned Mode);

				uint8_t getBufferStart() const override { return (uint8_t )Buffer->data(); }

				uint8_t *getBufferEnd() const override {
				return (uint8_t *)Buffer->data() + Buffer->size();
				}

				size_t getBufferSize() const override { return Buffer->size(); }

				std::error_code commit() override {
				// Unmap buffer, letting OS flush dirty pages to file on disk.
				Buffer.reset();

				// Atomically replace the existing file with the new one.
				auto EC = fs::rename(TempPath, FinalPath);
				sys::DontRemoveFileOnSignal(TempPath);
				return EC;
				}

				~OnDiskBuffer() {
	// Close the mapping before deleting the temp file, so that the removal			// Close the mapping before deleting the temp file, so that the removal
	// succeeds.			// succeeds.
	Region.reset();			Buffer.reset();
	sys::fs::remove(Twine(TempPath));			fs::remove(TempPath);
	}			}

	ErrorOr<std::unique_ptr<FileOutputBuffer>>			private:
	FileOutputBuffer::create(StringRef FilePath, size_t Size, unsigned Flags) {			std::unique_ptr<fs::mapped_file_region> Buffer;
	// Check file is not a regular file, in which case we cannot remove it.			std::string TempPath;
	sys::fs::file_status Stat;			};
	std::error_code EC = sys::fs::status(FilePath, Stat);
	bool IsRegular = true;			// A FileOutputBuffer which keeps data in memory and writes to the final
	switch (Stat.type()) {			// output file on commit(). This is used only when we cannot use OnDiskBuffer.
	case sys::fs::file_type::file_not_found:			class InMemoryBuffer : public FileOutputBuffer {
	// If file does not exist, we'll create one.			public:
	break;			InMemoryBuffer(StringRef Path, MemoryBlock Buf, unsigned Mode)
	case sys::fs::file_type::regular_file: {			: FileOutputBuffer(Path), Buffer(Buf), Mode(Mode) {}
	// If file is not currently writable, error out.
	// FIXME: There is no sys::fs:: api for checking this.			static ErrorOr<std::unique_ptr<InMemoryBuffer>>
	// FIXME: In posix, you use the access() call to check this.			create(StringRef Path, size_t Size, unsigned Mode) {
	}			std::error_code EC;
	break;			MemoryBlock MB = Memory::allocateMappedMemory(
	case sys::fs::file_type::directory_file:			Size, nullptr, sys::Memory::MF_READ \| sys::Memory::MF_WRITE, EC);
				zturnerUnsubmitted Not Done Reply Inline Actions Why can't we just use `new` here? zturner: Why can't we just use `new` here?
				ruiuAuthorUnsubmitted Not Done Reply Inline Actions I didn't design `Memory` class, but it looks like that class can be instantiated only via allocateMappedMemory. ruiu: I didn't design `Memory` class, but it looks like that class can be instantiated only via…
				zturnerUnsubmitted Not Done Reply Inline Actions Right, but why do we even need to use the `Memory` class at all? It seems like overkill. The comments say: /// This method allocates a block of memory that is suitable for loading /// dynamically generated code (e.g. JIT). Since we don't care about the protection (RWX) of the temporary storage, and only about the protection of the final file on disk, it seems like this is unnecessary. Instead of storing `OwningMemoryBlock` as a member of the class, you could just store `std::vector<uint8_t>`. There are fewer failure paths this way, as it's literally just one call to `new`, whereas `allocateMappedMemory` has several ways it can return errors. zturner: Right, but why do we even need to use the `Memory` class at all? It seems like overkill. The…
				ruiuAuthorUnsubmitted Not Done Reply Inline Actions How can you catch memory exhaustion error if you use `new`? ruiu: How can you catch memory exhaustion error if you use `new`?
				zturnerUnsubmitted Not Done Reply Inline Actions Hmm, yea. I guess that's a problem especially since file can be really large. Honestly I've always had a strong dislike for how in LLVM we write to files by mapping the entire contents into memory and then writing to it. There was actually a bug just yesterday because of this exact reason, where someone was trying to write a TAR file > 4GB, and so a bunch of hacks have to be added to lit to deal with the address space of the process. If something is a file, you should just be able to write to it. Especially in this case, where "writing directly to the underlying file" (as opposed to writing to a separate file first and then renaming) is actually the desired behavior. Anyway, I guess for now since we need to handle OOM error this is fine. zturner: Hmm, yea. I guess that's a problem especially since file can be really large. Honestly I've…
	return errc::is_a_directory;
	default:
	if (EC)			if (EC)
	return EC;			return EC;
	IsRegular = false;			return llvm::make_unique<InMemoryBuffer>(Path, MB, Mode);
	}			}

	SmallString<128> TempFilePath;			uint8_t getBufferStart() const override { return (uint8_t )Buffer.base(); }

				uint8_t *getBufferEnd() const override {
				return (uint8_t *)Buffer.base() + Buffer.size();
				}

				size_t getBufferSize() const override { return Buffer.size(); }

				std::error_code commit() override {
	int FD;			int FD;
	if (IsRegular) {			std::error_code EC;
	unsigned Mode = sys::fs::all_read \| sys::fs::all_write;			if (auto EC = openFileForWrite(FinalPath, FD, fs::F_None, Mode))
	// If requested, make the output file executable.			return EC;
	if (Flags & F_executable)			raw_fd_ostream OS(FD, /shouldClose=/true, /unbuffered=/true);
	Mode \|= sys::fs::all_exe;			OS << StringRef((const char *)Buffer.base(), Buffer.size());
	// Create new file in same directory but with random name.			return std::error_code();
	EC = sys::fs::createUniqueFile(Twine(FilePath) + ".tmp%%%%%%%", FD,
	TempFilePath, Mode);
	} else {
	// Create a temporary file. Since this is a special file, we will not move
	// it and the new file can be in another filesystem. This avoids trying to
	// create a temporary file in /dev when outputting to /dev/null for example.
	EC = sys::fs::createTemporaryFile(sys::path::filename(FilePath), "", FD,
	TempFilePath);
	}			}

	if (EC)			private:
				OwningMemoryBlock Buffer;
				unsigned Mode;
				};

				ErrorOr<std::unique_ptr<OnDiskBuffer>>
				OnDiskBuffer::create(StringRef Path, size_t Size, unsigned Mode) {
				// Create new file in same directory but with random name.
				SmallString<128> TempPath;
				int FD;
				if (auto EC = fs::createUniqueFile(Path + ".tmp%%%%%%%", FD, TempPath, Mode))
				zturnerUnsubmitted Not Done Reply Inline Actions Maybe want to add `override` to this and the other class's destructor zturner: Maybe want to add `override` to this and the other class's destructor
	return EC;			return EC;

	sys::RemoveFileOnSignal(TempFilePath);			sys::RemoveFileOnSignal(TempPath);

	#ifndef LLVM_ON_WIN32			#ifndef LLVM_ON_WIN32
	// On Windows, CreateFileMapping (the mmap function on Windows)			// On Windows, CreateFileMapping (the mmap function on Windows)
	// automatically extends the underlying file. We don't need to			// automatically extends the underlying file. We don't need to
	// extend the file beforehand. _chsize (ftruncate on Windows) is			// extend the file beforehand. _chsize (ftruncate on Windows) is
	// pretty slow just like it writes specified amount of bytes,			// pretty slow just like it writes specified amount of bytes,
	// so we should avoid calling that.			// so we should avoid calling that function.
	EC = sys::fs::resize_file(FD, Size);			if (auto EC = fs::resize_file(FD, Size))
	if (EC)
	return EC;			return EC;
	#endif			#endif

	auto MappedFile = llvm::make_unique<mapped_file_region>(			// Mmap it.
	FD, mapped_file_region::readwrite, Size, 0, EC);			std::error_code EC;
	int Ret = close(FD);			auto MappedFile = llvm::make_unique<fs::mapped_file_region>(
				FD, fs::mapped_file_region::readwrite, Size, 0, EC);
				close(FD);
	if (EC)			if (EC)
	return EC;			return EC;
	if (Ret)			return llvm::make_unique<OnDiskBuffer>(Path, TempPath, std::move(MappedFile));
	return std::error_code(errno, std::generic_category());

	std::unique_ptr<FileOutputBuffer> Buf(new FileOutputBuffer(
	std::move(MappedFile), FilePath, TempFilePath, IsRegular));
	return std::move(Buf);
	}			}

	std::error_code FileOutputBuffer::commit() {			// Returns true if Path does not exist or is a regular file.
	// Unmap buffer, letting OS flush dirty pages to file on disk.			static bool isFileRegular(StringRef Path, std::error_code &EC) {
				zturnerUnsubmitted Not Done Reply Inline Actions This is a bit confusing, in the sense that the function is called `isFileRegular` but then it can return true even if `is_regular_file` returns false. zturner: This is a bit confusing, in the sense that the function is called `isFileRegular` but then it…
	Region.reset();			fs::file_status Stat;
				EC = fs::status(Path, Stat);

	std::error_code EC;			switch (Stat.type()) {
	if (IsRegular) {			case fs::file_type::regular_file:
	// Atomically replace the existing file with the new one.			return true;
	EC = sys::fs::rename(Twine(TempPath), Twine(FinalPath));			case fs::file_type::file_not_found:
	sys::DontRemoveFileOnSignal(TempPath);			EC = std::error_code();
	} else {			return true;
	EC = sys::fs::copy_file(TempPath, FinalPath);			case fs::file_type::directory_file:
	std::error_code RMEC = sys::fs::remove(TempPath);			EC = errc::is_a_directory;
	sys::DontRemoveFileOnSignal(TempPath);			return false;
	if (RMEC)			default:
	return RMEC;			return false;
				}
	}			}

				ErrorOr<std::unique_ptr<FileOutputBuffer>>
				FileOutputBuffer::create(StringRef Path, size_t Size, unsigned Flags) {
				unsigned Mode = fs::all_read \| fs::all_write;
				if (Flags & F_executable)
				Mode \|= fs::all_exe;

				std::error_code EC;
				bool IsRegular = isFileRegular(Path, EC);
				zturnerUnsubmitted Not Done Reply Inline Actions It seems like we can just inline this. fs::file_status Stat; if (auto EC = fs::status(Path, Stat)) return EC; if (fs::is_directory(Stat)) return errc::is_a_directory; if (!fs::exists(Stat) \|\| fs::is_regular_file(Stat)) return OnDiskBuffer::create(Path, Size, Mode); return InMemoryBuffer::create(Path, Size, Mode); the extra function here just confuses things IMO. That said, if it reaches the `InMemoryBuffer` code path, then what exactly is `Path`? It definitely exists, but it's neither a file nor a directory. So it's probably either a symlink, block file, character file, fifo, socket, or unknown file. Why would we be want to overwrite one of those with something? zturner: It seems like we can just inline this. ``` fs::file_status Stat; if (auto EC = fs::status(Path…
				ruiuAuthorUnsubmitted Not Done Reply Inline Actions Thank you for the suggestion. Your code looks indeed better. Updated the patch. Also added comment to answer your question. ruiu: Thank you for the suggestion. Your code looks indeed better. Updated the patch. Also added…
				if (EC)
	return EC;			return EC;

				if (IsRegular)
				return OnDiskBuffer::create(Path, Size, Mode);
				return InMemoryBuffer::create(Path, Size, Mode);
	}			}
	} // namespace