This is an archive of the discontinued LLVM Phabricator instance.

[LLVM-AR] Fixed bug where temporary file would be left behind every time an archive was updated
ClosedPublic

Authored by cameron314 on May 2 2016, 3:21 PM.

Details

Summary

When updating an existing archive, llvm-ar opens the old archive into a MemoryBuffer, does its thing, and writes the results to a temporary file. That file is then renamed to the original archive filename, thus replacing it with the updated contents. However, on Windows at least, what would happen is that the MemoryBuffer for the old archive would actually be an mmap'ed view of the file, so when it came time to do the rename via Win32's ReplaceFile, it would succeed but would be unable to fully replace the file since there would still be a handle open on it; instead, the old version got renamed to a random temporary name and left behind.

This patch fixes the problem by allowing the backing memory buffer to be destroyed after it's no longer needed, just before the rename takes place.

Diff Detail

Event Timeline

cameron314 updated this revision to Diff 55901.May 2 2016, 3:21 PM
cameron314 retitled this revision from to [LLVM-AR] Fixed bug where temporary file would be left behind every time an archive was updated.
cameron314 updated this object.
cameron314 added a reviewer: Bigcheese.
cameron314 set the repository for this revision to rL LLVM.
cameron314 added a subscriber: llvm-commits.
rafael added inline comments.
include/llvm/Object/Archive.h
179 ↗(On Diff #55901)

We really should not do this.

All these objects just wrap a MemoryBufferRef. It is llvm-ar that should be deleting these buffers earlier.

silvas added a subscriber: silvas.May 3 2016, 1:22 PM

FWIW Michael and I have noticed llvm-ar leaving behind strange .TMP files created internally to ReplaceFile. Looking forward to trying this out!

@silvas: Yes, that's exactly what I discovered too while debugging this.

include/llvm/Object/Archive.h
179 ↗(On Diff #55901)

Ideally, I agree. But the design is such that llvm-ar has no control over when the buffer is done being used. It passes an Archive off to writeArchive which consumes it and does the file replacement in one step. I thought about splitting that function in two, but that's even uglier.

This was the cleanest way of fixing the bug I could come up with.

silvas added a comment.May 3 2016, 2:57 PM

@silvas: Yes, that's exactly what I discovered too while debugging this.

Is there a way we can fix this in rename in lib/Support/Windows/Path.inc? I just noticed that clang-tidy has similar issues.

All these objects just wrap a MemoryBufferRef. It is llvm-ar that should be deleting these buffers earlier.

Ideally, I agree. But the design is such that llvm-ar has no control over when the buffer is done being used. It passes an Archive off to writeArchive which consumes it and does the file replacement in one step. I thought about splitting that function in two, but that's even uglier.

Maybe pass the buffer ownership to writeArchive?

In any case, we really should not have an Archive that can optionally
own the buffer.

Cheers,
Rafael

@silvas: The problem, unfortunately, is not with the way rename is implemented -- when the destination file has an open file handle on it, there's only so much that can be done regardless of the method.

@rafael: I could do that. It made more sense to me that it be linked to the archive (since once freed, the archive is no longer valid).

cameron314 updated this revision to Diff 56161.May 4 2016, 9:17 AM
cameron314 updated this object.
cameron314 added a reviewer: rafael.
cameron314 removed rL LLVM as the repository for this revision.

All right, I've changed the patch according to your suggestion. The buffer is no longer linked to the Archive object, and is passed separately. (Note this leaves the Archive object in an unusable state after the buffer is removed from under it, but that llvm-ar doesn't use it after that point.)

Long term, it might be a good idea to introduce another flag to MemoryBuffer::getFile to always force it to copy the file into memory instead of mapping it. This would be less efficient most of the time, but would fix this sort of bug which is otherwise very annoying to handle if there's any abstraction at all in the code that consumes the buffer before overwriting the file. It seems there's a lot of read-modify-update file access idioms across the various tools that are prone to this problem.

rafael edited edge metadata.May 4 2016, 3:37 PM

This looks good to me with a few nits:

Expand the comment to describe the windows limitation. I think most people
looking at this code will have POSIX semantics in mind.

Second, is it possible to add a test the currently fails on windows? What
does the renamed file looks like? Is it something we can guess with a glob?

Thanks a lot for fixing this.

Cheers,
Rafael

I thought this would be an issue on POSIX as well, but after reading the documentation and testing it on a Linux machine it actually isn't -- POSIX allows a file to be renamed to a file that is currently open. This seems really unsafe to me since whoever has the destination file already open is getting the contents on disk changed out from under them (unless my understanding is off) which could cause all sorts of nasty corruption issues if there's a non-empty read or write buffer (or unflushed mapped memory) on that filehandle. (On Windows, a file that is in use can generally be renamed but not deleted, and the open handles keep referring to the same file regardless of its name -- which is why ReplaceFile in this case renames the destination file to let it hang around for whoever has it open.)

I'll expand the comment, good call!

I really don't know how to test this nicely. It depends on whether shouldUseMmap returns true in MemoryBuffer.cpp. The temporary files seem to be of the form "destarcname.a~RFnnnnxxxx.TMP", but this is a Win32 implementation detail and is not documented on MSDN, so is probably subject to change.

cameron314 updated this revision to Diff 56418.May 6 2016, 8:21 AM
cameron314 edited edge metadata.

Here's the same diff but with a better and expanded comment.

rafael accepted this revision.May 6 2016, 10:14 AM
rafael edited edge metadata.

LGTM.

Thank you so much for tracking this down.

This revision is now accepted and ready to land.May 6 2016, 10:14 AM

Do you have commit access?

Cheers,
Rafael

Thanks! It was bugging us too ;-)
No I do not. Would you mind committing it for me?

r268916.

Thank you so much,
Rafael