This is an archive of the discontinued LLVM Phabricator instance.

Rewrite FileOutputBuffer as two separate classes.
ClosedPublic

Authored by ruiu on Oct 30 2017, 8:01 PM.

Details

Summary

This patch is to rewrite FileOutputBuffer as two separate classes;
one for file-backed output buffer and the other for memory-backed
output buffer. I think the new code is easier to follow because two
different implementations are now actually separated as different
classes.

Unlike the previous implementation, the class that does not replace the
final output file using rename(2) does not create a temporary file at
all. Instead, it allocates memory using mmap(2) and use it. I think
this is an improvement because it is now guaranteed that the temporary
memory region doesn't trigger any I/O and there's now zero chance to
leave a temporary file behind. Also, it shouldn't impose new restrictions
because were using mmap IO too.

Diff Detail

Repository
rL LLVM

Event Timeline

ruiu created this revision.Oct 30 2017, 8:01 PM
zturner added inline comments.
llvm/lib/Support/FileOutputBuffer.cpp
84–85 ↗(On Diff #120931)

Why can't we just use new here?

145 ↗(On Diff #120931)

This is a bit confusing, in the sense that the function is called isFileRegular but then it can return true even if is_regular_file returns false.

170 ↗(On Diff #120931)

It seems like we can just inline this.

fs::file_status Stat;
if (auto EC = fs::status(Path, Stat))
  return EC;
if (fs::is_directory(Stat))
  return errc::is_a_directory;

if (!fs::exists(Stat) || fs::is_regular_file(Stat))
  return OnDiskBuffer::create(Path, Size, Mode);
return InMemoryBuffer::create(Path, Size, Mode);

the extra function here just confuses things IMO.

That said, if it reaches the InMemoryBuffer code path, then what exactly *is* Path? It definitely exists, but it's neither a file nor a directory. So it's probably either a symlink, block file, character file, fifo, socket, or unknown file. Why would we be want to overwrite one of those with something?

ruiu added inline comments.Oct 31 2017, 9:24 AM
llvm/lib/Support/FileOutputBuffer.cpp
84–85 ↗(On Diff #120931)

I didn't design Memory class, but it looks like that class can be instantiated only via allocateMappedMemory.

170 ↗(On Diff #120931)

Thank you for the suggestion. Your code looks indeed better. Updated the patch. Also added comment to answer your question.

ruiu updated this revision to Diff 120995.Oct 31 2017, 9:24 AM
  • Address Zach's comments
ruiu updated this revision to Diff 121016.Oct 31 2017, 10:42 AM
  • Fix typo
zturner added inline comments.Oct 31 2017, 11:12 AM
llvm/lib/Support/FileOutputBuffer.cpp
62 ↗(On Diff #121016)

Maybe want to add override to this and the other class's destructor

84–85 ↗(On Diff #120931)

Right, but why do we even need to use the Memory class at all? It seems like overkill. The comments say:

/// This method allocates a block of memory that is suitable for loading
/// dynamically generated code (e.g. JIT).

Since we don't care about the protection (RWX) of the temporary storage, and only about the protection of the final file on disk, it seems like this is unnecessary.

Instead of storing OwningMemoryBlock as a member of the class, you could just store std::vector<uint8_t>.

There are fewer failure paths this way, as it's literally just one call to new, whereas allocateMappedMemory has several ways it can return errors.

ruiu added inline comments.Oct 31 2017, 11:13 AM
llvm/lib/Support/FileOutputBuffer.cpp
84–85 ↗(On Diff #120931)

How can you catch memory exhaustion error if you use new?

zturner accepted this revision.Oct 31 2017, 11:28 AM
zturner added inline comments.
llvm/lib/Support/FileOutputBuffer.cpp
84–85 ↗(On Diff #120931)

Hmm, yea. I guess that's a problem especially since file can be really large.

Honestly I've always had a strong dislike for how in LLVM we write to files by mapping the entire contents into memory and then writing to it.

There was actually a bug just yesterday because of this exact reason, where someone was trying to write a TAR file > 4GB, and so a bunch of hacks have to be added to lit to deal with the address space of the process. If something is a file, you should just be able to write to it. Especially in this case, where "writing directly to the underlying file" (as opposed to writing to a separate file first and then renaming) is actually the desired behavior.

Anyway, I guess for now since we need to handle OOM error this is fine.

This revision is now accepted and ready to land.Oct 31 2017, 11:28 AM
This revision was automatically updated to reflect the committed changes.