This is an archive of the discontinued LLVM Phabricator instance.

[ELF] --gdb-index: split off GdbSymbol::CuVector and add a separate CuVectors
AbandonedPublic

Authored by MaskRay on Feb 14 2019, 8:53 PM.

Details

Summary

These GdbSymbol::CuVector cause memory allocation and cost lots of
memory. This patch splits off the field and add a separate CuVectors to
be more memory efficient.

For one of our large internal targets, there are 4791276 symbols and the
sum of size (eapacity) of GdbSymbol::CuVector is 19740000 (26185902).

Before: 24.820 seconds, 13.74GiB
After: 24.175 seconds, 13.21GiB

As a comparison,
/usr/bin/gold (Debian): 134.29 seconds, 12.04GiB
lld --no-gdb-index: 20.619 seconds, 9.12GiB

Event Timeline

MaskRay created this revision.Feb 14 2019, 8:53 PM

The current parallelism scheme may reach some local optimum and it is hard to improve further. I have tried another approach D58276, which may have to greater potential to improve. For D58276, the lost of parallelism is my concern.

ruiu added inline comments.Feb 15 2019, 3:31 PM
ELF/SyntheticSections.cpp
2484

Doesn't this make a copy of a vector? Is this vector always small?

2543

Perhaps I'm missing something, but why do you have to create both GdbIndex vector and CuVectors in this function? I wonder if you can split it up into two functions.

MaskRay abandoned this revision.EditedMar 6 2019, 12:12 AM

I'll abandon this revision. This does decrease memory footprint (~3.85% in two of our large internal executables) for glibc allocator based lld without a performance hit, but unexpectedly increases memory footprint for out internal tcmalloc based lld. I also don't like the additional complexity.

I have a strong belief: (I actually wanted to call it impossible but just didn't want to make an absolute assertion :) ) we can't decrease the memory usage of .gdb_index without a performance hit. I sorta blame the function has been optimized tell well for performance :( If we can emphasize less on performance, https://reviews.llvm.org/D58276 (and https://reviews.llvm.org/differential/diff/187004) is a more feasible direction to decrease memory usage (internally, we care memory usage a lot. The performance is already very good (2x ~ 4x faster compared with gold for a wide range of applications). Some sacrifice on it is totally acceptable, but on the other hand, we have some hard memory usage limits and the current memory footprint characterstics make some huge targets unable to link).

ruiu added a comment.Mar 11 2019, 5:33 PM

I have a strong belief: (I actually wanted to call it impossible but just didn't want to make an absolute assertion :) ) we can't decrease the memory usage of .gdb_index without a performance hit. I sorta blame the function has been optimized tell well for performance :( If we can emphasize less on performance, https://reviews.llvm.org/D58276 (and https://reviews.llvm.org/differential/diff/187004) is a more feasible direction to decrease memory usage (internally, we care memory usage a lot. The performance is already very good (2x ~ 4x faster compared with gold for a wide range of applications). Some sacrifice on it is totally acceptable, but on the other hand, we have some hard memory usage limits and the current memory footprint characterstics make some huge targets unable to link).

That is my perception too; it seems nearly impossible to reduce the memory usage without sacrificing speed.

ruiu added a comment.Mar 11 2019, 5:38 PM

(I'm sorry, I sent it prematurely.)

That is my perception too; it seems nearly impossible to reduce the memory usage without sacrificing speed. If memory consumption is a problem for most users, we probably should choose memory reduction over speed, but I don't think our use case within Google is strong enough to change that design choice. In lld, we parallelize things if we are handling a massive number of the same kind of objects, and this perfectly matches that pattern. It'd be pretty odd if we don't do this only this place.

Fortunately there is a workaround for it: we could build a binary without --gdb-index and then add it using gdb as a post-processing. Maybe we should live with that.