Patch implements support of zlib style compressed sections.
SHF_COMPRESSED flag is used to recognize that decompression is required.
After that decompression is performed and flag is removed from output.
Details
Diff Detail
Event Timeline
This patch uncompresses all input sections whether they are going to be added to the output or not. A lot of sections are in fact not going to make it to the output because of comdat deduplication and section gc. So uncompressing all inputs beforehand would be a waste of resource.
Until the content is copied to the output, I think we don't have to uncompress inputs. Do you think you can do it lazily?
Until the content is copied to the output, I think we don't have to uncompress inputs. Do you think you can do it lazily?
It should be done on demand transparently. You only need to decompress input
when getting the input contents.
Also please try to minimize the number of data copies. In this patch, compressed data is uncompressed to a buffer in memory and then copied to an mmap'ed output file. So the data is copied twice. It can be just once -- you can uncompress data directly to the mmap'ed output.
It's also worth to mention that you could utilize multiple cores if you uncompress sections in writeTo() since writeTo() is parallelized for each input section when --thread is given. Since uncompressing gzipped data is fairly CPU intensive task, I'd expect it would make the linker noticeable faster when handling compressed sections.
Thank you for this and other comments about the patch. I want to assure you that this time I was aware about most of details you wrote :)
There were 2 reasons from my side why I implemented it in this way:
- MergeOutputSection requires early access to uncompressed data. It uses it in constructor, it uses it in InputSectionBase<ELFT>::getOffset() and in other places. I am currently investigating how it is possible to improve that. I am pretty sure we do not want to have different logic for compressed/uncompressed sections, so I hope to refactor how MergeOutputSection works at first then. I mean that we probably want to delay the access to data as much as we can here. That was technical problem I faced, but I was sure it is not a problem for implementation way I used because of next:
- I was thinking that since compressed sections are used for compressing debug stuff, then nobody really cares about perfomance. That is something I am missing probably, but if we are talking about linkage speed during development process, then I assume that compression probably not involved. Compression-decompression takes time, and hdd space it too much cheap to use that. But when we switch to generating "production" binary using some inputs with debugging sources, then I assumed there is not much sence to save few percents (probably) of final time if we can keep code simple instead. Please point me what I missing here.
Anyways, I am currently investigating the MergeOutputSection logic, I hope to be able to refactor it a bit. I am hard in commenting here, I still need to investigate that.
Do compressed and mergeable sections exist? If the combination of the two doesn't exist in the real world, we can simply reject that.
I think performance matters even with compressed debug info. For example, if you are doing distributed build, you may want to reduce the network traffic by compressing large debug sections to reduce the overall build time (sending small files is faster than larger ones).
So I mean that testcase for patch was created from real world code and it has that, unfortunately or not.
OK, but compressed sections shouldn't be uncompressed until they need to be. At the symbol resolution phase, we don't need any section contents (and at the phase and gc phase, a lot of sections are eliminated.)
Right now the first major place I am looking how to avoid is:
>template <class ELFT> void Writer<ELFT>::run() { > copyLocalSymbols(); >> includeInSymtab() ....
if (auto *S = dyn_cast<MergeInputSection<ELFT>>(D->Section)) if (S->getRangeAndSize(D->Value).first->second == MergeInputSection<ELFT>::PieceDead) return false; }
I probably do not want to comment it because still investigating it, I just have a guess that we can avoid uncompression here and just check if whole section is dead or not instead of if checking if PieceDead. I am not ready to continue that thoughts now, still looking at, I am not familar with that code yet :)
So just please let me to spend some time on this, I am sure will able to suggest something to improve this patch.
llvm-mc changes required for testcase of this were reverted in r270638. I am looking into it, will ping/update this one after resolving that.
- Updated testcase after r270987, which updated llvm-mc to work with both zlib, zlib-gnu. Now I think this patch is ready to commit.
ELF/InputSection.cpp | ||
---|---|---|
112 | UncompressedSize | |
113–114 | Cast to Elf{32,64}_Chdr and access ch_size member instead of hard-coding the offsets. | |
ELF/InputSection.h | ||
91 | I'd name this Uncompressed. It needs a comment. // If a section is compressed, this vector has uncompressed section data. | |
ELF/Writer.cpp | ||
1194–1195 | H->sh_flags & ~SHF_GROUP & ~SHF_COMPRESSED |
I'd name this Uncompressed.
It needs a comment.