This is an archive of the discontinued LLVM Phabricator instance.

Avoid unnecessary buffer allocation and memcpy for compressed sections.
ClosedPublic

Authored by ruiu on Oct 4 2018, 4:47 PM.

Details

Summary

Previously, we uncompress all compressed sections before doing anything.
That works, and that is conceptually simple, but that could results in
a waste of CPU time and memory if uncompressed sections are then
discarded or just copied to the output buffer.

In particular, if .debug_gnu_pub{names,types} are compressed and if no
-gdb-index option is given, we wasted CPU and memory because we
uncompress them into newly allocated bufers and then memcpy the buffers
to the output buffer. That temporary buffer was redundant.

This patch changes how to uncompress sections. Now, compressed sections
are uncompressed lazily. To do that, Data member of InputSectionBase
is now hidden from outside, and data() accessor automatically expands
an compressed buffer if necessary.

If no one calls data(), then writeTo() directly uncompresses
compressed data into the output buffer. That eliminates the redundant
memory allocation and redundant memcpy.

This patch significantly reduces memory consumption (20 GiB max RSS to
15 Gib) for an executable whose .debug_gnu_pub{names,types} are in total
5 GiB in an uncompressed form.

Event Timeline

ruiu created this revision.Oct 4 2018, 4:47 PM

Out of curiosity - there's one relocation in each pubnames contribution. Does this solution (or future intended work) manage to avoid decompressing all the input compressed sections into memory, to then apply the relocation, and then write out the buffer (instead either streaming compressed bytes from input to output (either recompressing along the way, or not, as needed))?

Would be great to hear that this generalizes/handles relocations - to help with other compressed debug sections as well.

Though there are some debug sections which may have no relocations (debug_loclists and debug_rnglists maybe, debug_abbrev for sure) which would benefit even if this doesn't help sections with relocs.

ruiu added a comment.Oct 4 2018, 5:04 PM

In lld, we do apply relocations after we copy input section contents to the output buffer, so this patch doesn't affect how we apply relocations. If a compressed section contains relocations, they are processed fine like before.

In lld, we do apply relocations after we copy input section contents to the output buffer, so this patch doesn't affect how we apply relocations. If a compressed section contains relocations, they are processed fine like before.

Ah, awesome - should've guessed that might be the case. This sounds like it could be really helpful for all debug sections, then!

I see this uncompresses directly into the output, which is also great - no need to have the whole uncompressed contents in memory at any one time, then.

Out of curiosity - does LLD support compressed output? How/where does it do that? It'd be super nifty if it'd stream in the compressed data and stream it out through compression without having to have it all decompressed in memory. But that might hit problems with relocations that use the bytes in the section as part of the relocation computation :/

lld/ELF/InputSection.cpp
148

Did you intend to zero out this allocation? (the '()' after the [] will cause the array to be zero initialized, I think - which might be unnecessary if you're about to write to all the bytes anyway?)

ruiu added a comment.Oct 4 2018, 5:36 PM

Out of curiosity - does LLD support compressed output? How/where does it do that? It'd be super nifty if it'd stream in the compressed data and stream it out through compression without having to have it all decompressed in memory. But that might hit problems with relocations that use the bytes in the section as part of the relocation computation :/

lld supports compressed output. When an output section compression is enabled, we allocate a temporary buffer and pass that buffer to writeTo() as if it were a real output buffer. Then, writeTo() copies input sections to a given buffer and apply relocations. We then compress it and write to a real output buffer.

As you guessed, I don't think we can stream-compress output section contents because relocation application is essentially a random access to the output buffer.

lld/ELF/InputSection.cpp
148

Ah thanks. We don't have to initialize the buffer.

MaskRay added inline comments.Oct 4 2018, 5:36 PM
lld/ELF/InputSection.cpp
148

How about omitting the initializer (new char[Size]) to save value initialization of each char (filling with zeroes)

MaskRay added inline comments.Oct 4 2018, 5:39 PM
lld/test/ELF/relocatable-compressed-input.s
10

rLLD324944 did s/uncompress/decompress/g. If you now feel that uncompress is better, you may make these changes in a separate commit.

ruiu added inline comments.Oct 4 2018, 5:41 PM
lld/test/ELF/relocatable-compressed-input.s
10

I understand the point but this is sort of related to this change because now I'm using zlib::uncompress instead of Decompressor. I think Decompressor should have been named Uncompressor in the first place, but I'm following the local convention (which is now "uncompress").

ruiu updated this revision to Diff 168419.Oct 4 2018, 5:50 PM
  • do not zero-initialize a temporary buffer
  • inline data()
MaskRay added inline comments.Oct 4 2018, 5:53 PM
lld/ELF/InputSection.cpp
152

It seems uncompress() is only called once. What do you think if uncompress() is inlined here and these mutable member variables are marked as mutable?

ruiu added inline comments.Oct 4 2018, 5:56 PM
lld/ELF/InputSection.cpp
152

I could, but I'm not sure if it is a good thing to mark RawData and UncompressedBuf mutable throughout this class. At least in this way, only this function mutates these members.

grimar added a comment.Oct 5 2018, 3:10 AM

Idea by itself LG. My suggestions about implementation details are below.

lld/ELF/InputSection.cpp
266

What do you think about still using the Decompressor here for parsing the
headers and checking zlib availability?
Like below.

Decompressor Dec = check(Decompressor::create(Name, toStringRef(RawData),
                                              Config->IsLE, Config->Is64));
UncompressedSize = Dec.getDecompressedSize();
if (Name.startswith(".zdebug")) {
  RawData = RawData.slice(12);
  Name = Saver.save("." + Name.substr(2));
  return;
}
assert(Flags & SHF_COMPRESSED);
Flags &= ~(uint64_t)SHF_COMPRESSED;

// New-style 64-bit header
if (Config->Is64)
  RawData = RawData.slice(sizeof(Elf64_Chdr));
else
  RawData = RawData.slice(sizeof(Elf32_Chdr));

or, if we add getCompressedData method to Decompressor class, it can be:

Decompressor Dec = check(Decompressor::create(Name, toStringRef(RawData),
                                              Config->IsLE, Config->Is64));
UncompressedSize = Dec.getDecompressedSize();
RawData = Dec.getCompressedData();

if (Name.startswith(".zdebug")) {
  Name = Saver.save("." + Name.substr(2));
  return;
}

assert(Flags & SHF_COMPRESSED);
Flags &= ~(uint64_t)SHF_COMPRESSED;
lld/test/ELF/strip-debug.s
3

If you remove --strip-debug, this will crash the linker because of
llvm_unreachable("zlib::uncompress is unavailable"); on windows, where zlib is not available.

You may want to check zlib::isAvailable() or use Decompressor.

ruiu updated this revision to Diff 168480.Oct 5 2018, 9:47 AM
  • use zlib::isAvailable
lld/ELF/InputSection.cpp
266

I found that Decompressor hides too many details and is not easy to use. I feel more comfortable to do this explicitly in this file.

lld/test/ELF/strip-debug.s
3

Done.

dblaikie added inline comments.Oct 5 2018, 9:54 AM
lld/ELF/InputSection.cpp
152

Personally I'd vote for using mutable - using const_cast places a "spooky action at a distance" to the use of this type (it means you can't make a const instance of this type if you use this function - because const_casting away const on an actually const object is UB). While that might not be an issue in this particular class (it might be quite unlikely anyone would make a const instance of this type) - I'd still lean that way out of stylistic consistency with other use cases, etc.

Not a huge deal, though.

ruiu updated this revision to Diff 168488.Oct 5 2018, 10:37 AM
  • make member variables mutable instead of using const_cast to call uncompress()
lld/ELF/InputSection.cpp
152

I don't have a strong opinion, so I changed the code to not use const_cast but mutable.

MaskRay accepted this revision.Oct 5 2018, 6:15 PM
This revision is now accepted and ready to land.Oct 5 2018, 6:15 PM
This revision was automatically updated to reflect the committed changes.