This is an archive of the discontinued LLVM Phabricator instance.

ELF: Use bump pointer allocator for uncompressed section buffers. NFCI.
ClosedPublic

Authored by pcc on Mar 12 2019, 11:16 AM.

Details

Summary

This shaves another word off SectionBase and makes it possible to clone a
section using the implicit copy constructor.

This basically reverts r311056, which removed the mutex in order to
make the code easier to understand. On balance I think it's probably more
straightforward to have a mutex here than to have an unusual copy constructor
in SectionBase.

Event Timeline

pcc created this revision.Mar 12 2019, 11:16 AM
ruiu added a comment.Mar 12 2019, 11:23 AM

I wonder how much does it actually cost to acquire a lock under the situation where there's no contention. Except this place, all accesses to BAlloc are already serialized, so there's usually no contention. If the cost of acquiring a lock under that situation is negligible, I'd imagine that always acquiring the lock could be a reasonable option.

pcc added a comment.Mar 12 2019, 12:24 PM

I wonder how much does it actually cost to acquire a lock under the situation where there's no contention. Except this place, all accesses to BAlloc are already serialized, so there's usually no contention. If the cost of acquiring a lock under that situation is negligible, I'd imagine that always acquiring the lock could be a reasonable option.

The only other user of BAlloc is StringSaver (all other allocations use a SpecificBumpPtrAllocator). Are you proposing to change BumpPtrAllocator::Allocate to take a lock?

ruiu added a comment.Mar 12 2019, 12:40 PM

I thought that we could protect BumpPtrAllocator::Allocate with a mutex, but since StringSaver directly calls that function, there's no way to acquire a lock before calling that function. Maybe we should protect StringSaver::save? Or, we could also make BAlloc and StringSaver thread-local variables. That might be a better approach.

pcc added a comment.EditedMar 12 2019, 1:06 PM

I thought that we could protect BumpPtrAllocator::Allocate with a mutex, but since StringSaver directly calls that function, there's no way to acquire a lock before calling that function. Maybe we should protect StringSaver::save?

We could, I suppose? But then that code would be effectively dead since we don't save any strings on threads other than the main thread right now. I'm more inclined to let TSan find any mistakes involving BumpPtrAllocator here.

Or, we could also make BAlloc and StringSaver thread-local variables. That might be a better approach.

I'm not sure if that will work. Imagine that we spin up a thread that decompresses a section and then terminates. When the thread terminates, the allocator is destroyed, so we are left with a dangling pointer in the section object.

We could work around that by having the thread local variables be pointers to objects allocated on the heap, and arrange for them to be destroyed later in lld::freeArena. But now we're back to the original problem of thread-safely adding a "thing to deallocate later" to a list.

ruiu accepted this revision.Mar 12 2019, 1:13 PM
In D59269#1426696, @pcc wrote:

I thought that we could protect BumpPtrAllocator::Allocate with a mutex, but since StringSaver directly calls that function, there's no way to acquire a lock before calling that function. Maybe we should protect StringSaver::save?

We could, I suppose? But then that code would be effectively dead since we don't save any strings on threads other than the main thread right now. I'm more inclined to let TSan find any mistakes involving BumpPtrAllocator here.

Or, we could also make BAlloc and StringSaver thread-local variables. That might be a better approach.

I'm not sure if that will work. Imagine that we spin up a thread that decompresses a section and then terminates. When the thread terminates, the allocator is destroyed, so we are left with a dangling pointer in the section object.

We could work around that by having the thread local variables be pointers to objects allocated on the heap, and arrange for them to be destroyed later in lld::freeArena. But now we're back to the original problem of thread-safely adding a "thing to deallocate later" to a list.

Hmm, you are right. My approach won't work. The other thing I can think of is to make BumpPtrAllocator thread-safe and mostly lock-free using compare-and-exchange. That doesn't seem technically not too hard, but that's probably too much. I think I'm fine with your original approach.

LGTM

This revision is now accepted and ready to land.Mar 12 2019, 1:13 PM
This revision was automatically updated to reflect the committed changes.

This patch decreases memory usage by 1% when linking a huge internal target with tcmalloc-based lld. Nice job! (However, on the other hand, it makes early freeing pattern impossible; doing that might have been rejected, though)

grimar added inline comments.Mar 13 2019, 1:22 AM
ELF/InputSection.h
217 ↗(On Diff #190320)

Not relative to this patch, but I wonder why this is an int64_t and not a uint64_t.

ruiu added inline comments.Mar 13 2019, 11:15 AM
ELF/InputSection.h
217 ↗(On Diff #190320)

To represent -1 as -1, I guess?

grimar added inline comments.Mar 15 2019, 2:20 AM
ELF/InputSection.h
217 ↗(On Diff #190320)

But it could be uint64_t, we already have code around in linker that does that.
E.g:

// Get an offset in an output section this relocation is applied to.
uint64_t Offset = GetOffset.get(Rel.r_offset);
if (Offset == uint64_t(-1))
  return;

I am not sure it is useful atm to have a max size of UncompressedSize to be max(uint64)-1
instead of max(int64_t), but why not...

ruiu added inline comments.Mar 18 2019, 1:36 PM
ELF/InputSection.h
217 ↗(On Diff #190320)

I'm not too worried, but if you want to consistently use uint64_t, feel free to do that. By the way uint64_t(-1) has the same bit pattern as UINT64_MAX.