Page MenuHomePhabricator

WIP: [lldb/FileSystem] Add & use CreateReadonlyDataBuffer where possible
Changes PlannedPublic

Authored by vsk on Feb 14 2020, 5:45 PM.

Details

Summary

Add FileSystem::CreateReadonlyDataBuffer to allow lldb to open files
using mmap() (i.e. without any heap allocation).

There's no functionality change intended here. We've been getting
reports of lldb using 2GB+ of heap memory while debugging Xcode [1], and
I think this should help with that.

This is WIP because ObjectFileELF::RelocateDebugSections mutates a buffer
obtained from ObjectFile. SymbolFile/DWARF/parallel-indexing-stress.s is the
only failing test, everything else passes with the current patch (on Darwin).

rdar://53785446

[1] heap report from two different users:

Count Bytes Avg Size Symbol
11404 2143015136 187917.8 (anonymous namespace)::MemoryBufferMem<llvm::WritableMemoryBuffer> C++ LLDB
11948 2537624624 212389.1 (anonymous namespace)::MemoryBufferMem<llvm::WritableMemoryBuffer> C++ LLDB

Diff Detail

Event Timeline

vsk created this revision.Feb 14 2020, 5:45 PM

+ @labath @rupprecht, my tentative plan is to fix this by having ObjectFileELF make a copy into a heap-allocated buffer when applying relocations. Please lmk if that's problematic (although, that seems like a slightly lazier version of what we do today, so hopefully it's all right).

Before you get too carried away with this, I'd like us to clarify something. Your comment seems to imply that FileSystem::CreateDataBuffer does not use mmap. That doesn't sound right, and it's not what I see happening on linux now (and I don't see why macos would be different):

$ strace -e trace=file,desc bin/lldb bin/lldb -o "image dump sections" -b
...
openat(AT_FDCWD, "bin/lldb", O_RDONLY|O_CLOEXEC) = 4
fstat(4, {st_mode=S_IFREG|0770, st_size=414096, ...}) = 0
mmap(NULL, 414096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 4, 0) = 0x7f94160d7000

What this does is create a mapping, where unmodified pages are backed by the file on disk, and memory is allocated for any modified pages in the usual copy-on-write fashion. So, if you don't modify any pages -- you get no allocations.

It is true that FileSystem::CreateDataBuffer can sometimes use a heap allocation (malloc) to fulfill the request, but this has nothing to do with the writability of that buffer. What happens is that if we detect that the file we are about to read comes from a "remote" system (NFS, etc.), then we will fall back to malloc+read, because mmaps of volatile files are somewhat precarious.

For this reason, I am somewhat doubtful that this patch will solve the memory problems with memory usage, except maybe by playing some accounting tricks, where a read-only mapping would get reported in a different bucket than a read-write COW one. So, I gotta ask: How sure are you that this patch will solve the problems reported by your users? Is it possible that these two users have some kinds of remote mounts or something similar that could be throwing us off track?

Regardless of the answers to the questions above, I think that this patch, in principle, is a good idea, but I think we should encode the mutability of the DataBuffer in the type system. If we can ensure that CreateReadonlyDataBuffer returns a const DataBuffer, then we can have the compiler check for us that noone writes to these buffers, instead of us trying to guess.

As for ObjectFileELF, relocating the data inside RelocateDebugSections would basically reimplement what the COW mmap gives us for free. So, I think a simpler solution is to just make sure ObjectFileELF always creates a read-write mapping for the object file.

vsk planned changes to this revision.Feb 17 2020, 4:22 PM

@labath thank you for your comments. I'm not yet sure what explains the average size of the WritableMemoryBuffer's allocated on our users' machines (~ 187,918 bytes, comfortably larger than the 16K threshold in shouldUseMmap). I'll need to figure this out before going any further.

I did expect openFileAux to always malloc() when constructing a WritableMemoryBuffer, forgetting that a MAP_PRIVATE mapping would also work. Generally, on Darwin (& probably elsewhere) we want to use MAP_PRIVATE mappings as much as possible, as these do not need to get written to swap space or stashed in the VM compressor.