Refactoring function writeArchive in ArchiveWriter. Added a new
function writeArchiveBuffer that returns the archive in a memory
buffer instead of writing it out to the disk. This refactor is necessary
so as to allow llvm-libtool-darwin to write universal files containing
Refactoring function writeArchive in ArchiveWriter. Added a new
My current approach involves factoring out the common code of writing the archive to a temporary FD. Now, in case of writeArchive, I simply write the temporary file to disk by invoking Temp->keep(ArcName). However, for writeArchiveBuffer, I read the archive again in a MemoryBuffer using
ErrorOr<std::unique_ptr<MemoryBuffer>> Ret = MemoryBuffer::getOpenFile(Temp->FD, ArcName, -1);
This memory buffer is then put in ReturnBuffer (which is the buffer allocated by the caller to receive the archive buffer).
Please let me know if there is a better approach to get the archive in a buffer than this current roundtrip method(writing to a FD and then reading back into a MemBuf). Thanks!
To add a bunch of context, one functionality we're adding to llvm-libtool-darwin is to produce universal binary outputs. For those who aren't familiar, universal binaries (also known as fat binaries) allow a single binary to have slices for multiple architectures. In libtool's case, if you pass it object files of different architectures as input, it'll group each architecture's object files together into a single-architecture static library, and then group those single-architecture static libraries into a universal static library. The functionality to create the universal static library exists in llvm-lipo, and we're refactoring that out into libObject in D84662 to allow llvm-libtool-darwin to use it as well.
What we'd ideally want is to be able to construct an object::Archive in memory (in libtool), and then pass that Archive along to writeUniversalBinary. Unfortunately, the libObject facilities for reading and writing archives seem to be completely independent. Reading uses Archive, Archive::Child, etc., whereas writing uses NewArchiveMember and writeArchive. There's no way, as far as I can see, to construct an Archive by specifying its members; you can only construct one from a block of memory. Similarly, there's no way to serialize an Archive out to a file.
It'd perhaps be nice to unify the archive reading/writing interfaces, but there might be good reasons for the current split, and doing so would also be an extensive change. Instead, what we're doing here is adding the ability for writeArchive to give us back a MemoryBuffer (instead of writing to a temp file), and then parsing that MemoryBuffer as an Archive. That's a bit wasteful (less so if we eliminate the temp file and just keep things in memory), but given the completely separate reading/writing interfaces, it was the best solution we could think of. We're definitely open to better ideas if people have them!
We might be able to do something with raw_svector_ostream. I'll experiment a bit.
This should be static.
I played with this a bit and got something working, but I'm not convinced it's better. My idea was to make writeArchiveInFD take a raw_ostream &Out parameter. All the temp file handling shifts into writeArchive, which opens the temp file, creates a raw_fd_ostream from it, and passes that to writeArchiveInFD. writeArchiveBuffer takes a SmallString<0> & paramater, creates a raw_svector_ostream from that, and passes it to writeArchiveInFD. It then uses MemoryBuffer::getMemBuffer to create a MemoryBuffer from the SmallString and returns that.
The problem is that the MemoryBuffer then doesn't own the memory it's pointing to, which doesn't fit very well with libObject's OwningBinary model. You have to manage the lifetime of the SmallString (which actually owns the data) separately, and make sure the MemoryBuffer doesn't outlive it. (You also have to potentially manage multiple such SmallStrings, once for each slice.) We could use MemoryBuffer::getMemBufferCopy to have the MemoryBuffer own the contents, but then it's doing a copy, which isn't a whole lot better than what you have here. (It's at least avoiding the temp file, but filesystem caches are also a thing, so I'm not sure how much that ends up mattering in practice.)
Perhaps writeTemporaryArchive (since it's an archive in a temp file)?
+1. Also, consider naming this writeArchiveToBuffer.
Honestly, I was a bit surprised to see writeArchiveBuffer messing about with temp files. The streaming approach would have been much closer to what I expected. That being said, if it's complicated to implement, I'm willing to let it through.
I just learned about the existence of SmallVectorMemoryBuffer, which seems perfect for our use case :)
@sameerarora101, we can have writeTemporaryArchive take a raw_ostream &Out parameter. The temporary file creation and handling shifts into writeArchive. writeArchiveBuffer creates a SmallVector<char, 0> and a raw_svector_ostream from it, and then returns a SmallVectorMemoryBuffer created from its SmallVector. That should allow us to be efficient and also avoid any memory management issues :)
If OldArchiveBuf is not used (for now), delete it. It is a good idea adding a comment that this function is similar to writeArchive but has a different return type.
The names need redoing again, since the logic has moved around a bit. I recommend this becomes writeArchive or writeARchiveToStream and the function that writes to a file becomes writeArchiveToFile or simply leave as writeArchive since the signatures are different.
Also, ArcName doesn't appear to be used any more?
I don't think this needs doing - the MemoryBuffer will be destroyed once it goes out-of-scope (I'm assuming that reset effectively just clears the buffer). In that case, there's no need for it to even be passed in. In fact, looking up the stack, there's no need for it to be passed into any of these functions as far as I can see?
Removed OldArchiveBuf from signature, thanks.
Yup, removed ArcName from the signature.
Isn't it the case that we would need OldArchiveBuf.reset(); in writeArchive and not in writeARchiveToStream? This is because, as per the comment, we need to close all handles to the archive before calling Temp->keep(name), right? I have shifted this to writeArchive now. Please let me know if this is a wrong approach.