This is an archive of the discontinued LLVM Phabricator instance.

Discard uncompressed buffer after creating .gdb_index contents.
ClosedPublic

Authored by ruiu on Sep 14 2018, 3:37 PM.

Details

Summary

Once we create .gdb_index contents, .zdebug_gnu_pub{names,types}
are useless, so there's no need to keep their uncompressed data
in memory.

I observed that for a test case in which lld creates a 3GB .gdb_index
section, the maximum resident set size reduced from 43GB to 29GB after
this patch.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

ruiu created this revision.Sep 14 2018, 3:37 PM
MaskRay accepted this revision.Sep 14 2018, 3:44 PM

Great finding! Just a question: do you move the code because the new place fits well?

This revision is now accepted and ready to land.Sep 14 2018, 3:44 PM
MaskRay added inline comments.Sep 14 2018, 3:46 PM
lld/ELF/SyntheticSections.cpp
2516 ↗(On Diff #165603)

I don't know if for std::unique_ptr, reset() is more conventional

ruiu added a comment.Sep 14 2018, 3:48 PM

Great finding! Just a question: do you move the code because the new place fits well?

In the above parallelForEachN, we read contents of .debug_gnu_pub{names,types}, so the previous position doesn't work.

ruiu updated this revision to Diff 165605.Sep 14 2018, 3:51 PM
  • use reset()
This revision was automatically updated to reflect the committed changes.

Out of curiosity - would it be possible/useful to avoid keeping all the
pubnames sections uncompressed at the same time? Would they be able to be
processed one at a time (uncompress 1, process it, delete it, uncompress
the second, process it, delete it, etc)?

ruiu added a comment.Sep 17 2018, 4:08 PM

It is doable and perhaps we should do that. Currently, we decompress all compressed sections before doing anything, so that such sections are handled as if they weren't compressed at all, but sometimes that leads to a waste of time and memory.

ruiu added a comment.Sep 17 2018, 4:11 PM

But one thing we need to keep in mind (and that's what I'm currently working on) is, if we discard a decompressed section buffer, we can't have StringRefs pointing to that section. That naturally affects the design because we usually create a lot of StringRefs pointing to input sections to avoid the cost of copying. I don't have a good idea about how to write code that works well both for uncompressed and compressed sections yet.

*nod* It might be possible to (maybe too low-level - not sure if zlib
exposes this, or if the format even allows it to be efficiently answered)
retrieve the size of a compressed section without decompressing it (at
least in zlib-gnu fromat, I think the uncompressed size is written before
the compressed data, so easy there) - that way maybe more things wouldn't
need to care about whether it was compressed or not - it could be
decompressed lazily (& then the other half would be to deallocate promptly,
as soon as those bytes were finished with/written out to the output and no
longer needed).

At some point, it'd be even great to use streaming compression in and out.
(I guess you could probably even use streaming decompression for the
pubnames - so even a whole object file's pubnames wouldn't need to be
decompressed simultaneously - just ask for the next chunk of decompressed
data, process it, then overwrite it with the next chunk, etc).

ruiu added a comment.Sep 17 2018, 4:21 PM

This might be a silly question, but why do we compress only debug sections? If we really want to compress object files for valid reasons (e.g. reducing amount of network traffic when doing a distributed build) we can simply compress an entire object file instead of compressing only the debug section. Then we can stream-uncompress object files to a disk and then run the linker on the input files.

Fair question - for Google, stream uncompressing to disk wouldn't help
matters - disk is a ramfs, so it's the same as uncompressing the whole
thing to a buffer in memory, which hurts a bit (due to memory limits).

For somewhat more "normal" users (I assume compressed debug info was
probably implemented before google's needs - but I could be wrong there,
maybe Google folks implemented it in gold/gcc before the LLVM switch) -
especially pre-Fission, debug info was the big culprit. Though that doesn't
mean it was better to just compress it rather than compressing everything.
Perhaps because keeping the object file as the outer container meant more
things continued to "just work" - objdump, etc, things that relied only on
the object file headers and not the section contents.