mach-o supports "fat" files which are a header/table-of-contents followed by a concatenation of mach-o files of different architectures. Currently, MemoryBuffer has no easy way to map a subrange (slice) of a file which lld will need to select a mach-o slice of a fat file. The new function provides an easy way to map a slice of a file into a MemoryBuffer. Test case included.
Details
Diff Detail
Event Timeline
What are the downsides to mapping the whole file and then taking a StringRef to the sub-file?
There are two issues:
- Apple has hundreds of dylibs and they are usually fat. A 32-bit process could run out of address space if it had to map the entirety of every fat file.
- The lld the structure is that the InputGraphNode maps the file, then some Reader claims it and parses it. The parsing may or may not have pointers into the original buffer. If it does, the created lld::File object takes ownership of the buffer as part of parsing. If not, the ownership stays with the InputGraphNode which quickly frees it. In that model, you cannot pass down a StringRef because that cannot transfer ownership of the original MemoryBuffer. If MemoryBuffer had getFileSlice(), then the darwin InputGraphNode can check if the file is fat, and if so, only map the relevant slice. That would solve the address space issue and fit into the lld MemoryBuffer ownership model.
I think this is a good idea. It is perfectly reasonable for a tool to know that it needs only a part of a file and let the OS unmap the rest.
I noticed some horrible inconsistencies in the pre existing code and fixed them. I should hopefully make this change a bit simpler.
include/llvm/Support/MemoryBuffer.h | ||
---|---|---|
135 | For consistency with getOpenFileSlice, please make this getFileSlice(const Twine &Filename, uint64_t MapSize, int64_t Offset); or update getOpenFileSlice. | |
lib/Support/MemoryBuffer.cpp | ||
174 | Can't you just add an MapSize argument to getFileAux? |
include/llvm/Support/MemoryBuffer.h | ||
---|---|---|
131 | s/Length/MapSize/ Pass MapSize before Offset. Alternatively, update getOpenFileSlice to have this order/name. My only preference is that they match. | |
lib/Support/MemoryBuffer.cpp | ||
335 | Why was it necessary to move this? Currently if the user passes the MapSize and we don't require a null terminator, we never call stat in here, since we don't need the file size. I now notice that we call stat in mapped_file_region, which is a regression from when we transitioned to mapped_file_region :-( In any case , if we can avoid an extra stat we should. |
include/llvm/Support/MemoryBuffer.h | ||
---|---|---|
131 | Fixed to match getOpenFileSlice | |
lib/Support/MemoryBuffer.cpp | ||
335 | My thought was that since getFileSlice() provides a MapSize but not FileSize, we need to determine the file size. But it seems we don't actually need the FileSize when mapping. The stat() looks like it is there to set the FileSize, but it is really there to get the MapSize. |
s/Length/MapSize/
Pass MapSize before Offset.
Alternatively, update getOpenFileSlice to have this order/name. My only preference is that they match.