Page MenuHomePhabricator

[llvm-libtool-darwin] Refactor ArchiveWriter
ClosedPublic

Authored by sameerarora101 on Jul 29 2020, 8:39 AM.

Details

Summary

Refactoring function writeArchive in ArchiveWriter. Added a new
function writeArchiveBuffer that returns the archive in a memory
buffer instead of writing it out to the disk. This refactor is necessary
so as to allow llvm-libtool-darwin to write universal files containing
archives.

Diff Detail

Event Timeline

sameerarora101 created this revision.Jul 29 2020, 8:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2020, 8:39 AM
sameerarora101 requested review of this revision.Jul 29 2020, 8:39 AM
sameerarora101 added inline comments.Jul 29 2020, 8:48 AM
llvm/lib/Object/ArchiveWriter.cpp
672–673

My current approach involves factoring out the common code of writing the archive to a temporary FD. Now, in case of writeArchive, I simply write the temporary file to disk by invoking Temp->keep(ArcName). However, for writeArchiveBuffer, I read the archive again in a MemoryBuffer using

ErrorOr<std::unique_ptr<MemoryBuffer>> Ret =
    MemoryBuffer::getOpenFile(Temp->FD, ArcName, -1);

This memory buffer is then put in ReturnBuffer (which is the buffer allocated by the caller to receive the archive buffer).

Please let me know if there is a better approach to get the archive in a buffer than this current roundtrip method(writing to a FD and then reading back into a MemBuf). Thanks!

To add a bunch of context, one functionality we're adding to llvm-libtool-darwin is to produce universal binary outputs. For those who aren't familiar, universal binaries (also known as fat binaries) allow a single binary to have slices for multiple architectures. In libtool's case, if you pass it object files of different architectures as input, it'll group each architecture's object files together into a single-architecture static library, and then group those single-architecture static libraries into a universal static library. The functionality to create the universal static library exists in llvm-lipo, and we're refactoring that out into libObject in D84662 to allow llvm-libtool-darwin to use it as well.

What we'd ideally want is to be able to construct an object::Archive in memory (in libtool), and then pass that Archive along to writeUniversalBinary. Unfortunately, the libObject facilities for reading and writing archives seem to be completely independent. Reading uses Archive, Archive::Child, etc., whereas writing uses NewArchiveMember and writeArchive. There's no way, as far as I can see, to construct an Archive by specifying its members; you can only construct one from a block of memory. Similarly, there's no way to serialize an Archive out to a file.

It'd perhaps be nice to unify the archive reading/writing interfaces, but there might be good reasons for the current split, and doing so would also be an extensive change. Instead, what we're doing here is adding the ability for writeArchive to give us back a MemoryBuffer (instead of writing to a temp file), and then parsing that MemoryBuffer as an Archive. That's a bit wasteful (less so if we eliminate the temp file and just keep things in memory), but given the completely separate reading/writing interfaces, it was the best solution we could think of. We're definitely open to better ideas if people have them!

llvm/lib/Object/ArchiveWriter.cpp
672–673

We might be able to do something with raw_svector_ostream. I'll experiment a bit.

smeenai added inline comments.Jul 29 2020, 8:39 PM
llvm/lib/Object/ArchiveWriter.cpp
556–559

This should be static.

672–673

I played with this a bit and got something working, but I'm not convinced it's better. My idea was to make writeArchiveInFD take a raw_ostream &Out parameter. All the temp file handling shifts into writeArchive, which opens the temp file, creates a raw_fd_ostream from it, and passes that to writeArchiveInFD. writeArchiveBuffer takes a SmallString<0> & paramater, creates a raw_svector_ostream from that, and passes it to writeArchiveInFD. It then uses MemoryBuffer::getMemBuffer to create a MemoryBuffer from the SmallString and returns that.

The problem is that the MemoryBuffer then doesn't own the memory it's pointing to, which doesn't fit very well with libObject's OwningBinary model. You have to manage the lifetime of the SmallString (which actually owns the data) separately, and make sure the MemoryBuffer doesn't outlive it. (You also have to potentially manage multiple such SmallStrings, once for each slice.) We could use MemoryBuffer::getMemBufferCopy to have the MemoryBuffer own the contents, but then it's doing a copy, which isn't a whole lot better than what you have here. (It's at least avoiding the temp file, but filesystem caches are also a thing, so I'm not sure how much that ends up mattering in practice.)

sameerarora101 marked 2 inline comments as done.Jul 30 2020, 7:15 AM
sameerarora101 added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
672–673

Ooh makes sense, thanks a lot for explaining in such detail.

sameerarora101 marked an inline comment as done.

Making writeArchiveInFD static

MaskRay added inline comments.Jul 31 2020, 6:42 PM
llvm/lib/Object/ArchiveWriter.cpp
667

I can't find writeArchiveInFD. Did you forget to add a dependent patch?

smeenai added inline comments.Jul 31 2020, 6:48 PM
llvm/lib/Object/ArchiveWriter.cpp
667

The old writeArchive is changed to it, in line 556 above.

MaskRay added inline comments.Jul 31 2020, 6:55 PM
llvm/lib/Object/ArchiveWriter.cpp
557

FD might be a bit of misnomer (to me). I'd expect FD to refer to a file descriptor...

659

What about changing ReturnBuffer to be the return type Expected<std::unique_ptr<MemoryBuffer>> ?

663

Delete the empty line.

665

Delete the empty line.

jhenderson added inline comments.Aug 3 2020, 12:42 AM
llvm/lib/Object/ArchiveWriter.cpp
557

Perhaps writeTemporaryArchive (since it's an archive in a temp file)?

659

+1. Also, consider naming this writeArchiveToBuffer.

672–673

Honestly, I was a bit surprised to see writeArchiveBuffer messing about with temp files. The streaming approach would have been much closer to what I expected. That being said, if it's complicated to implement, I'm willing to let it through.

sameerarora101 marked 6 inline comments as done.Aug 3 2020, 7:44 AM
sameerarora101 updated this revision to Diff 282623.EditedAug 3 2020, 7:46 AM

Change name and signature of writeArchiveBuffer

smeenai added inline comments.Aug 3 2020, 11:54 AM
llvm/lib/Object/ArchiveWriter.cpp
672–673

I just learned about the existence of SmallVectorMemoryBuffer, which seems perfect for our use case :)

@sameerarora101, we can have writeTemporaryArchive take a raw_ostream &Out parameter. The temporary file creation and handling shifts into writeArchive. writeArchiveBuffer creates a SmallVector<char, 0> and a raw_svector_ostream from it, and then returns a SmallVectorMemoryBuffer created from its SmallVector. That should allow us to be efficient and also avoid any memory management issues :)

sameerarora101 added inline comments.Aug 3 2020, 12:03 PM
llvm/lib/Object/ArchiveWriter.cpp
672–673

oh wow, I'll try this out. Thanks a lot!

Implement writeArchiveBuffer using SmallVectorMemoryBuffer.

MaskRay added inline comments.Aug 3 2020, 4:13 PM
llvm/include/llvm/Object/ArchiveWriter.h
47

If OldArchiveBuf is not used (for now), delete it. It is a good idea adding a comment that this function is similar to writeArchive but has a different return type.

jhenderson added inline comments.Aug 4 2020, 12:19 AM
llvm/lib/Object/ArchiveWriter.cpp
557

The names need redoing again, since the logic has moved around a bit. I recommend this becomes writeArchive or writeARchiveToStream and the function that writes to a file becomes writeArchiveToFile or simply leave as writeArchive since the signatures are different.

Also, ArcName doesn't appear to be used any more?

644–654

I don't think this needs doing - the MemoryBuffer will be destroyed once it goes out-of-scope (I'm assuming that reset effectively just clears the buffer). In that case, there's no need for it to even be passed in. In fact, looking up the stack, there's no need for it to be passed into any of these functions as far as I can see?

sameerarora101 marked 2 inline comments as done.Aug 4 2020, 9:28 AM
sameerarora101 added inline comments.
llvm/include/llvm/Object/ArchiveWriter.h
47

Removed OldArchiveBuf from signature, thanks.

llvm/lib/Object/ArchiveWriter.cpp
557

Yup, removed ArcName from the signature.

644–654

Isn't it the case that we would need OldArchiveBuf.reset(); in writeArchive and not in writeARchiveToStream? This is because, as per the comment, we need to close all handles to the archive before calling Temp->keep(name), right? I have shifted this to writeArchive now. Please let me know if this is a wrong approach.

sameerarora101 marked 2 inline comments as done.

Removing OldArchiveBuf from writeArchiveToBuffer signature.

Looks good, but I think it'll help if you can upload the dependent patch (draft) so that people can see that the interface is suitable.

jhenderson accepted this revision.EditedAug 5 2020, 12:36 AM

LGTM! But please wait for others.

llvm/lib/Object/ArchiveWriter.cpp
644–654

Makes sense. Apparently I blanked half the comment...!

This revision is now accepted and ready to land.Aug 5 2020, 12:36 AM

Looks good, but I think it'll help if you can upload the dependent patch (draft) so that people can see that the interface is suitable.

You mean the next diff (dependent on this one) where I'll be using writeArchiveToBuffer? Yup, I plan to upload it soon. Thanks.

smeenai accepted this revision.Aug 10 2020, 7:44 PM

Looks great!

llvm/lib/Object/ArchiveWriter.cpp
639

I believe you need to explicitly call Temp->discard() in this path, otherwise you'll trip this assert. Note that discard() itself can return an error, in which case you probably want to do joinErrors?

MaskRay accepted this revision.Aug 10 2020, 9:42 PM

LGTM, with @smeenai's sys::fs::TempFile::discard comment addressed.

-DCMAKE_BUILD_TYPE=Debug builds enable assertions by default. For Release builds, you may want -DLLVM_ENABLE_ASSERTIONS=on

sameerarora101 marked an inline comment as done.Aug 11 2020, 8:06 AM
sameerarora101 added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
639

Oh makes sense, thanks. Added it now.

sameerarora101 marked an inline comment as done.

Add Temp->dicard() error.

Rebase to resolve previous build errors.

This revision was automatically updated to reflect the committed changes.