This is an archive of the discontinued LLVM Phabricator instance.

[Support] Add MemoryBuffer::getFileSlice()
ClosedPublic

Authored by kledzik on Sep 19 2014, 6:35 PM.

Details

Reviewers
rafael
Summary

mach-o supports "fat" files which are a header/table-of-contents followed by a concatenation of mach-o files of different architectures. Currently, MemoryBuffer has no easy way to map a subrange (slice) of a file which lld will need to select a mach-o slice of a fat file. The new function provides an easy way to map a slice of a file into a MemoryBuffer. Test case included.

Diff Detail

Event Timeline

kledzik updated this revision to Diff 13901.Sep 19 2014, 6:35 PM
kledzik retitled this revision from to [Support] Add MemoryBuffer::getFileSlice().
kledzik updated this object.
kledzik edited the test plan for this revision. (Show Details)
kledzik added a reviewer: rafael.
kledzik added a subscriber: Unknown Object (MLST).
silvas added a subscriber: silvas.Sep 25 2014, 3:24 PM

What are the downsides to mapping the whole file and then taking a StringRef to the sub-file?

There are two issues:

  1. Apple has hundreds of dylibs and they are usually fat. A 32-bit process could run out of address space if it had to map the entirety of every fat file.
  1. The lld the structure is that the InputGraphNode maps the file, then some Reader claims it and parses it. The parsing may or may not have pointers into the original buffer. If it does, the created lld::File object takes ownership of the buffer as part of parsing. If not, the ownership stays with the InputGraphNode which quickly frees it. In that model, you cannot pass down a StringRef because that cannot transfer ownership of the original MemoryBuffer. If MemoryBuffer had getFileSlice(), then the darwin InputGraphNode can check if the file is fat, and if so, only map the relevant slice. That would solve the address space issue and fit into the lld MemoryBuffer ownership model.
rafael edited edge metadata.Oct 7 2014, 12:23 PM

I think this is a good idea. It is perfectly reasonable for a tool to know that it needs only a part of a file and let the OS unmap the rest.

I noticed some horrible inconsistencies in the pre existing code and fixed them. I should hopefully make this change a bit simpler.

include/llvm/Support/MemoryBuffer.h
135

For consistency with getOpenFileSlice, please make this

getFileSlice(const Twine &Filename, uint64_t MapSize,  int64_t Offset);

or update getOpenFileSlice.

lib/Support/MemoryBuffer.cpp
174

Can't you just add an MapSize argument to getFileAux?

kledzik updated this revision to Diff 14530.Oct 7 2014, 2:21 PM
kledzik edited edge metadata.

Rework to use getFileAux()

rafael added inline comments.Oct 7 2014, 2:43 PM
include/llvm/Support/MemoryBuffer.h
131

s/Length/MapSize/

Pass MapSize before Offset.

Alternatively, update getOpenFileSlice to have this order/name. My only preference is that they match.

lib/Support/MemoryBuffer.cpp
335

Why was it necessary to move this?

Currently if the user passes the MapSize and we don't require a null terminator, we never call stat in here, since we don't need the file size.

I now notice that we call stat in mapped_file_region, which is a regression from when we transitioned to mapped_file_region :-(

In any case , if we can avoid an extra stat we should.

include/llvm/Support/MemoryBuffer.h
131

Fixed to match getOpenFileSlice

lib/Support/MemoryBuffer.cpp
335

My thought was that since getFileSlice() provides a MapSize but not FileSize, we need to determine the file size.

But it seems we don't actually need the FileSize when mapping. The stat() looks like it is there to set the FileSize, but it is really there to get the MapSize.

kledzik updated this revision to Diff 14536.Oct 7 2014, 3:49 PM

fixes from latest Rafael comments.

rafael accepted this revision.Oct 7 2014, 5:12 PM
rafael edited edge metadata.

LGTM

This revision is now accepted and ready to land.Oct 7 2014, 5:12 PM
kledzik closed this revision.Oct 7 2014, 5:33 PM

committed in r219260