This is an archive of the discontinued LLVM Phabricator instance.

BumpPtr allocate SimpleReferences
ClosedPublic

Authored by kledzik on Dec 3 2014, 5:13 PM.

Details

Summary

The mach-o part of lld has a leak because it BumpPtr allocates SimpleDefinedAtoms. The problem is SimpleDefinedAtom has a ivar that is a vector of SimpleReferences. Since BumpPtr allocated objects are not destructed, the allocations done in the vector are leaked.

This fix is to add a BumpPtrAllocator to the base lld::File class and alter SimpleDefinedAtom to allocate its SimpleReference in the BumpPtr pool. The mess is that vectors don't BumpPtrAllocate well, so I switched to using a linked list (llvm::ilist<>) for the SimpleReference.

I tried adding the BumpPtrAllocator instead to some subclasses of lld::File, but failed because a handful of places add SimpleDefinedAtom to lots of different subclasses of File. And since the general model is that Atoms are owned by their File, it makes sense to have an allocator in the File for allocating various objects related to the file or atoms (e.g. strings).

Diff Detail

Event Timeline

kledzik updated this revision to Diff 16901.Dec 3 2014, 5:13 PM
kledzik retitled this revision from to BumpPtr allocate SimpleReferences.
kledzik updated this object.
kledzik edited the test plan for this revision. (Show Details)
kledzik added a project: lld.
kledzik added a subscriber: Unknown Object (MLST).
ruiu edited edge metadata.Dec 3 2014, 6:46 PM

Why don't you make a new instance of Reference using operator new and store the pointer to File's std::vector<std::unique_ptr<Reference>>? If you do that, Reference's destructor would be called. Is there any reason to continue using BumpPtrAllocator?

Even if you add a std::vector<std::unique_ptr<Reference>> to lld::File (so the SimpleReferences are deleted when the File is deleted), you still need a std::vector<SimpleReference*> in each SimpleDefineAtom to track which references are used by that atom. And the allocations for that vector would be leaked if the Atom is BumpPtr allocated.

References are not that big and are never removed. The current std::vector<SimpleReference> in SimpleDefinedAtom does a lot of copying as references are added. This is a good use case for BumpPtr allocation.

The only other sane thing to do is never BumpPtr allocate Simple Atoms. But that is sad because BumpPtr allocation is an efficient way to create atoms while parsing a file.

I like the idea of using the file allocator for all objects that would be owned by that file. If you keep a vector of std::vector<SimpleReference *> and allocate the simple references by using the file allocator, wouldn't it suffice ?

This would essentially destroy all the references when the File goes away, Am I missing something ?

Jean-Daniel edited edge metadata.Dec 4 2014, 12:51 AM
In D6518#7, @shankarke wrote:

This would essentially destroy all the references when the File goes away, Am I missing something ?

Yes, you miss the memory allocated by the vector to store the pointers themselves. The problem is that whatever you store in the vector, it need to allocate storage, and as the vector destructor is never called, that memory will leaks.

We could use include/clang/Analysis/Support/BumpVector.h which simplifies things. For this the header file has to be moved to llvm/Support.

http://clang.llvm.org/doxygen/BumpVector_8h_source.html.

The vector class takes a

BumpVectorContext(llvm::BumpPtrAllocator &A)

ruiu added a comment.Dec 4 2014, 11:11 AM

I think the other way is to pass a custom Allocator backed by a BumpPtr owned by File to std::vector owned by DefinedAtom, so that the vector in a DefinedAtom is freed when the file is deleted. I'd expect it would be faster than the linked list (because of better memory locality). Also could eliminate the copy-to-vector-and-sort code.

A downside to BumpVector is that as the vector grows (by adding References), the capacity is doubled by bump allocating a new chunk. All the previous allocations (e.g. if current at 8 elements, the allocs for 1, 2, and 4) are wasted space in the bumpptr pool.

A linked list does not have this problem because the elements never move and are never copying. They just have a pointer to the next one.

But using the BumpVector makes the code cleaner too right ? and also the memory sanitizers will not show any more leaks as they are going to be freed.

Another possiblity would be to resize the BumpVector by the size of references we need completely(by counting the number of relocation records).

But using the BumpVector makes the code cleaner too right ?

Only because you are adding new BumpVector code. We could also add a new BumpList class to ADT and that would make the lld code very clean too.

and also the memory sanitizers will not show any more leaks as they are going to be freed.

That is a false negative. There will be leaks (unuseable memory lost) in the BumPtr pools.

Another possiblity would be to resize the BumpVector by the size of references we need completely(by counting the number of relocation records).

On mach-o, there is not a one-to-one mapping of relocations to references. Also that would take an extra pass to group relocs by atom before creating the References. Right now, it takes one pass to iterate across all relocs, create a Reference, then figure out what atom to add it to.

colinl added a subscriber: colinl.EditedDec 4 2014, 11:59 AM

Won't "RefList _reference" leak the nodes of the linked list in the same way it leaks the vector backing allocation? Without an allocator template llvm:ilist is using new under the hood to allocate list nodes.

N/M, the pointers are internal to the items themselves.

Won't "RefList _reference" leak the nodes of the linked list

They are allocated in the BumpPtr pool at:

node = new (_file.allocator()) SimpleReference(...)

It doesn't seem like there's any memory different between the linked list version and the bumpvector version. The linked list version requires 2*(pointer_size)*N memory and if a bumpvector is resized to double each time, the sum of all intermediate backing allocations comes out the same.

16 Items in a linked list requires 2*16*pointer
16 Items in a bumpvector requires (1+2+4+8+16)*pointer

ruiu added a comment.Dec 4 2014, 1:20 PM

I think Colin's calculation is correct. That makes me think of whether or
not we really want to use BumpPtr allocator at all. We might be trying to
optimize too much without knowing if this is important for performance (I
could be easily convinced if there's benchmark result). Regular memory
allocator shouldn't be that slow, and instead of doing something clever
with BumpPtr, we can always use operator new and smart pointers.

Colin's calculation is based on the bumpvector containing only the *pointer* to the Reference. The current implementation has a vector< SimpleReference>, so the wasted space is not just a pointer size (repeatedly doubled). A SimpleReference is 64 bytes in size, so that value keeps getting doubled (and wasted).

Even if we switched to using a bumpvector for a pointer to reference, we still need to allocate the SimpleReference body somewhere.

For Atoms and References, the best allocator is a BumpPtr allocator. Because, 1) Atoms (and References) are owned by their File and ownership never changes, 2) they are never destructed (until the File is destructed). If either of those was false, then we could not use BumpPtr.

The reason for this patch is because the current SimpleDefinedAtom has an embedded vector which means SimpleDefinedAtom (and subclasses) cannot be BumpPtr allocated (without leaking). If we don't want to change SimpleDefinedAtom, I can change MachODefinedATom to not subclass SimpleDefinedAtom.

ruiu added a comment.Dec 4 2014, 2:18 PM

I was thinking that Colin has suggested to store pointers to the vector
instead of the object itself.

I think the patch is good to land. There seems to be other ways to do the
same thing, but the code is mostly at one place, so we can revisit this
topic if we need to. This patch wouldn't harm anyone who doesn't touch this
code.

And the memory leak is a real issue which has to be resolved.

ruiu added a comment.Dec 5 2014, 11:18 AM

Looks like no serious concerns regarding this change. LGTM in case you need this word.

Just a side note. After this patch, we should probably take some time to review all subclasses of File that also provide a bump allocator and remove it to use the File's one instead.

kledzik accepted this revision.Dec 5 2014, 2:17 PM
kledzik added a reviewer: kledzik.

commited in r 223528

This revision is now accepted and ready to land.Dec 5 2014, 2:17 PM