This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Implemented --compress-debug-sections option.
ClosedPublic

Authored by grimar on Apr 11 2017, 8:50 AM.

Details

Summary

Patch implements --compress-debug-sections=zlib.

In compare with D20211 (a year old patch, abandoned), it implementation
uses streaming and fully reimplemented, does not support zlib-gnu for
simplification.

This is PR32308.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar created this revision.Apr 11 2017, 8:50 AM
grimar edited reviewers, added: ruiu; removed: rui314.
ruiu edited edge metadata.Apr 11 2017, 9:41 AM

This patch doesn't use multiple cores to compress .debug section contents. In theory, if there are multiple .debug sections, we can compress them separately. Can you do that?

The code that writes section contents into an in-memory buffer looks very similar to code that writes sections to the output file buffer. This needs fixing.

The other approach I can think of is to do everything as a post-process. After writing everything to the output buffer, you scan .debug sections (that are at end of file), compress them in-place, and then shifts them to fill the holes created by data compression. Have you thought about that?

ELF/Driver.cpp
560 ↗(On Diff #94833)

What is -gabi?

562 ↗(On Diff #94833)

This is obvious, so remove this comment.

ELF/Options.td
25 ↗(On Diff #94833)

Why don't you use J?

ELF/OutputSections.cpp
74 ↗(On Diff #94833)

Why?

110 ↗(On Diff #94833)

What is 1024?

Is the default compression level the best for us?

ruiu added inline comments.Apr 11 2017, 10:52 AM
ELF/Driver.cpp
560 ↗(On Diff #94833)

I don't think zlib and zlib-gabi are the same. What you implemented in this patch is zlib-gabi, so don't handle zlib as a synonym for zlib-gabi.

ELF/OutputSections.cpp
309–315 ↗(On Diff #94833)

You can do this in finalize(), right?

test/ELF/compress-debug-sections.s
13–15 ↗(On Diff #94833)

Don't check for the section contents because zlib doesn't really guarantee the output.

27 ↗(On Diff #94833)

Ditto

29–48 ↗(On Diff #94833)

Why do you need this?

emaste added a subscriber: emaste.Apr 11 2017, 11:40 AM
In D31941#723864, @ruiu wrote:

This patch doesn't use multiple cores to compress .debug section contents. In theory, if there are multiple .debug sections, we can compress them separately. Can you do that?

zlib format specification (http://www.zlib.org/rfc-zlib.html, 2.2. Data format) says that it has header, some data and checksum after, it not just a piece of raw bytes that we can
concatenate and have the same result so simple. At the same time one of zlib authors answered how that can be implemented in short:
http://stackoverflow.com/questions/30794053/how-to-use-multiple-threads-for-zlib-compression-same-input-source
It looks a bit tricky because requires fixups of results (dropping bytes etc), but definetely should be possible.

If so it may be reasonable to implement threading compression API for LLVM and reuse it in LLD.

That probably means we can go with simple streaming imprementation for start and replace code in OutputSection::finalize() with threading solution when it be implemented. What do you think ?

The code that writes section contents into an in-memory buffer looks very similar to code that writes sections to the output file buffer. This needs fixing.

That should be solveable.

The other approach I can think of is to do everything as a post-process. After writing everything to the output buffer, you scan .debug sections (that are at end of file), compress them in-place, and then shifts them to fill the holes created by data compression. Have you thought about that?

I thought about different post-process and droped idea (it was estimate somehow max compressed size and compress in place in writeTo. If estimated size value minus real size value would be something not too large,
it could work. Do not know how it can be possible to do such estimation and anyways output would be larger than possible).

And your suggestion doesn't look too clean either, does it ? That would mean that we do not know final layout even during writing, what a bit wierd. And then either we need to fixup sections headers and write them after other data. Then do shifting and after that all truncate output file, right ? That is a bit too tricky for my eye. That may degradate perfomance either, probably.

ELF/Driver.cpp
560 ↗(On Diff #94833)

http://man7.org/linux/man-pages/man1/ld.1.html says:
--compress-debug-sections=zlib and
--compress-debug-sections=zlib-gabi compress DWARF debug sections with SHF_COMPRESSED from the ELF ABI. The default behaviour varies depending upon the target involved and the configure options used to build the toolchain.

https://docs.oracle.com/cd/E53394_01/html/E54813/man-cds.html says that there are 3 options: none, zlib, zlib-gnu.
It does not mention what is zlib-gabi.

The only compression flag available now in generic ABI is ELFCOMPRESS_ZLIB:
http://www.sco.com/developers/gabi/latest/ch4.sheader.html

So my conclusion from above:

  1. Since oracle does not mention zlib-gabi, I implemented zlib and not zlib-gabi.
  2. Since ld manual says both zlib/zlib-gabi compresses sections with SHF_COMPRESSED from ABI and only available type is ELFCOMPRESS_ZLIB, then zlib and zlib-gabi is the same.
ELF/Options.td
25 ↗(On Diff #94833)

Copypaste from def O: Joined<["-"] will fix.

ELF/OutputSections.cpp
74 ↗(On Diff #94833)

https://docs.oracle.com/cd/E36784_01/html/E36857/compressed_debug_sections.html
("To be a candidate for compression, a section must be non-allocable, and belong to ....")

https://docs.oracle.com/cd/E53394_01/html/E54813/section_compression.html
("SHF_COMPRESSED applies only to non-allocable sections, and cannot be used in conjunction with SHF_ALLOC. In addition, SHF_COMPRESSED cannot be applied to sections of type SHT_NOBITS.").

110 ↗(On Diff #94833)

1024 is a size of buffer used for output in zlib::StreamCompression implementation currently. It may be worth to try different values and benchmark LLD separatelly.

Regarding second question, zlib::StreamCompression indeed currently uses default compression level. What is best may probably depend on what we want. Speed or size or something in the middle. That also requires additional benchmarking probably and additional discussion based on benchmarking results. In current implemenation compression class (D31943) does not yet have API for setting level.

309–315 ↗(On Diff #94833)

Do you mean adding compression header to buffer ? If so, yes, I can.
(Sure I can't write something to output file there, output is not exist at that point).

ruiu added a comment.Apr 12 2017, 12:29 PM

What I was saying was that you can compress multiple sections concurrently, right? I didn't say you should try to compress single stream of data using multiple threads. I meant, since we have multiple .debug_ sections, you could compress them separately.

ELF/Driver.cpp
560 ↗(On Diff #94833)

When I give -compress-debug-sections=zlib and -compress-debug-sections=zlib-gabi, outputs from gold were actually different. But interestingly enough the difference wasn't meaningful. Gold seems to set garbage to ch_reserved members of compressed section headers. (Probably they forgot to initialize the header with zero before setting values to the members.)

So my conclusion is zlib-gabi and zlib are the same. But I don't see a reason to add two identical options at the moment. Please add just "zlib".

ELF/OutputSections.cpp
72–76 ↗(On Diff #94833)

The way this code is written is confusing. What you should state is that we want to compress .debug sections (which needless to say non-alloc sections).

// If -compress-debug-sections is specified, we compress output debug sections.
if (Config->CompressDebugSections && Name.startswith(".debug_") && !(Flags & SHF_ALLOC))
  this->Flags |= SHF_COMPRESSED;

However, this piece of code should probably be moved to finalize().

grimar updated this revision to Diff 95114.Apr 13 2017, 6:10 AM
  • Added multithreaded compression of output sections.
  • Removed streaming zlib compression.
  • Addressed review comments.
In D31941#725216, @ruiu wrote:

What I was saying was that you can compress multiple sections concurrently, right? I didn't say you should try to compress single stream of data using multiple threads. I meant, since we have multiple .debug_ sections, you could compress them separately.

OK, I understood differently. Thought you're talking about multithreaded compression for input .debug* sections.

ELF/Driver.cpp
560 ↗(On Diff #94833)

Done.

562 ↗(On Diff #94833)

Done.

ELF/Options.td
25 ↗(On Diff #94833)

Fixed.

ELF/OutputSections.cpp
72–76 ↗(On Diff #94833)

Ok. FWIW spec states that debug sections are ".debug*" and not ".debug_*"
(https://docs.oracle.com/cd/E53394_01/html/E54813/man-cds.html)
Though I think your way should work anyways.

It also mentiones other namings:
Debug sections are identified by having a .compcom, .line, .stab*, .debug*, or .zdebug* section name.

I would start only from .debug ones in this patch.

test/ELF/compress-debug-sections.s
13–15 ↗(On Diff #94833)

I reimplemented this test in a more smart way.

grimar edited the summary of this revision. (Show Details)Apr 13 2017, 6:19 AM
ruiu added inline comments.Apr 13 2017, 10:33 AM
ELF/OutputSections.cpp
90 ↗(On Diff #95114)

I think it is better to template this function with ELFT so that you can access members by name using ELFT::Chdr type.

115 ↗(On Diff #95114)

Rename this maybeComrpess as it does't always compress contents.

143–147 ↗(On Diff #95114)

Move this to OutputSection::compress().

ELF/OutputSections.h
92–93 ↗(On Diff #95114)

CompressedHeader can be part of CompressedData.

95–96 ↗(On Diff #95114)

You don't need this, I guess?

ruiu added inline comments.Apr 13 2017, 1:28 PM
ELF/Options.td
26 ↗(On Diff #95114)

Compresses -> Compress

grimar updated this revision to Diff 95285.Apr 14 2017, 3:34 AM
  • Addressed review comments.
ELF/Options.td
26 ↗(On Diff #95114)

Done.

ELF/OutputSections.cpp
90 ↗(On Diff #95114)

Done.

115 ↗(On Diff #95114)

Done.

143–147 ↗(On Diff #95114)

Done.

ELF/OutputSections.h
92–93 ↗(On Diff #95114)

That what I wanted to do, but I think there is no way to do that in effective way with current zlib::compress implementation.

It takes buffer for output, resizes and fills on its side. There is no way to say: "I need some room at start".
As a result after doing compression if you want to add some header all section content should be shifted and it can affect perfomance if section is huge.

So I did them as separate fields for now.

95–96 ↗(On Diff #95114)

Right. Removed.

ruiu accepted this revision.Apr 14 2017, 11:29 AM

LGTM

There is a room for improvements, but I'll do that after you submit this.

ELF/OutputSections.cpp
99–109 ↗(On Diff #95285)

You are not accessing these members by name.

This revision is now accepted and ready to land.Apr 14 2017, 11:29 AM
ruiu added inline comments.Apr 14 2017, 12:23 PM
ELF/OutputSections.cpp
128 ↗(On Diff #95285)

I believe this function returns an Error. Make sure you handle an error before submitting.

This revision was automatically updated to reflect the committed changes.

Thanks for review, Rui !

ELF/OutputSections.cpp
99–109 ↗(On Diff #95285)

Ah. I misread your comment at first, sorry. Fixed in commit.

128 ↗(On Diff #95285)

Done.

grimar added inline comments.Apr 17 2017, 2:22 AM
lld/trunk/ELF/OutputSections.cpp
110

I think we can simplify this to nex one now ?

if (Config->Is64)
  Buf += sizeof(Elf64_Word);
write(Buf, (typename ELFT::uint)Size, E);
Buf += sizeof(ELFT::Chdr::ch_size);
write(Buf, (typename ELFT::uint)Alignment, E);
Buf += sizeof(ELFT::Chdr::ch_addralign);