This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Add --compress-sections
Changes PlannedPublic

Authored by MaskRay on Jul 6 2023, 12:17 PM.

Details

Summary

--compress-sections <section-glib>=[zlib|zstd] is like a generalized
--compress-debug-sections that applies to arbitrary sections, including
SHF_ALLOC ones. This option has a number of candidate use cases for metadata
sections, including:

For SHF_ALLOC use cases, a supporting runtime library can identify the section
content with a pair of symbols __start_<sectionname> and
__stop_<sectionname> and check the header to know whether it is compressed or
not. There are some caveats:

  • We compute the section content/size once in finalizeAddressDependentContent before compression. If the content or size changes, the compressed content will be invalid, but we don't detect changed content (e.g., data commands). However, we detect size changes in assignOffsets.
  • If there are dynamic relocations, rtld do not skip these relocations and will cause runtime crash or writable data corruption. In general, label differences should be used (see foo0 in the test) and the runtime library needs to adjust the differences.
  • Symbols defined relative to the output section desginate the offsets to the uncompressed content.

In addition, compressing synthetic sections like .symtab/.strtab and regular
data/code sections will be problematic, but we don't report an error.

GNU ld feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=27452

Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674
Link: https://groups.google.com/g/generic-abi/c/HUVhliUrTG0 ("Allow SHF_ALLOC | SHF_COMPRESSED sections")

Diff Detail

Event Timeline

MaskRay created this revision.Jul 6 2023, 12:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 12:17 PM
Herald added a subscriber: emaste. · View Herald Transcript
MaskRay requested review of this revision.Jul 6 2023, 12:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2023, 12:17 PM

The code overall looks OK, but am I understand it right that the discussion on the Generic System V ABI mailing list is not finished yet?

[ELF] Add --compress-ections

Please, don't forget to fix the typo in the title and in the first line of the description.

lld/ELF/OutputSections.cpp
333–338
461–463

The comment needs to be updated

lld/ELF/Writer.cpp
539–540

This comment should be removed

1618–1620

A few words about how this requirement is enforced would be great.

By the way, where does this requirement come from? Do you think that code sections shouldn't be compressed at all, or only those that need thunks or other fixes? Why?

MaskRay updated this revision to Diff 537983.Jul 6 2023, 9:58 PM
MaskRay marked 3 inline comments as done.
MaskRay retitled this revision from [ELF] Add --compress-ections to [ELF] Add --compress-sections.
MaskRay edited the summary of this revision. (Show Details)

thanks for the quick comments!

MaskRay updated this revision to Diff 537984.Jul 6 2023, 9:59 PM
MaskRay marked an inline comment as done.

remove a stale comment

The code overall looks OK, but am I understand it right that the discussion on the Generic System V ABI mailing list is not finished yet?

Not finished yet. I think people have misunderstanding. foo0 and nonalloc0 in lld/test/ELF/compress-sections.s demonstrate possible metadata section uses.
write0 (dynamic relocations) is problematic and should be avoided.

Just created a write-up about the caveats of SHF_ALLOC|SHF_COMPRESSED sections and other things: https://maskray.me/blog/2023-07-07-compressed-arbitrary-sections :)

We compute the section content/size once in finalizeAddressDependentContent before compression. If the content or size changes, the compressed content will be invalid, but we don't detect changed content (e.g., data commands). However, we detect size changes in assignOffsets.

I guess this means that if the writeTo() has any relocations they won't work with compression. The presence of relocations or possibly use of one of the relocate functions could generate an error. It probably wouldn't be intuitive to a user, but would protect them from wasting hours wondering why their data was corrupt (I'm assuming few people read the documentation). Off the top of my head "Cannot compress <output section>, <input section> from <object> contains relocations."

In armlink which does read-write data compression, we have this rather complicated scheme:

  • Allocate Final VMA Addresses, with predictions for LMA
  • Filter out relocations (in non compressed sections) to linker defined symbols that depend on a compressed address, this is easier in armlink as linker defined symbols are heavily constrained.
  • Resolve relocation
  • Compress RW Data
  • Allocate post compression addresses, VMA remain the same, LMA Addresses may change.
  • Resolve the filtered relocations

This adds considerable complexity though.

Not had a chance to go through the code and tests yet, been a very busy week. Will try and do that as soon as possible.

MaskRay planned changes to this revision.Jul 7 2023, 9:49 AM

We compute the section content/size once in finalizeAddressDependentContent before compression. If the content or size changes, the compressed content will be invalid, but we don't detect changed content (e.g., data commands). However, we detect size changes in assignOffsets.

I guess this means that if the writeTo() has any relocations they won't work with compression. The presence of relocations or possibly use of one of the relocate functions could generate an error. It probably wouldn't be intuitive to a user, but would protect them from wasting hours wondering why their data was corrupt (I'm assuming few people read the documentation). Off the top of my head "Cannot compress <output section>, <input section> from <object> contains relocations."

I agree. The current compressing once approach has a severe limitation and is error-prone. Worse, it does not consider thunks:

  • The uncompressed section content decides the compressed section size.
  • The compressed section size affects addresses of subsequent sections and symbol assignments. The affected sections include text sections that use range extension thunks.
  • Subsequent sections and symbol assignments may affect the uncompressed section content. + PC-relative references to text sections (e.g., .quad .text.foo-.) change values when the text section address changes. + data commands in an output section description may change. + location counter increments (e.g., . += expr;) in an output section description may change.
SECTIONS {
  ...
  foo : { *(foo*) QUAD(expr1) . += expr2; }
}

In armlink which does read-write data compression, we have this rather complicated scheme:

  • Allocate Final VMA Addresses, with predictions for LMA
  • Filter out relocations (in non compressed sections) to linker defined symbols that depend on a compressed address, this is easier in armlink as linker defined symbols are heavily constrained.
  • Resolve relocation
  • Compress RW Data
  • Allocate post compression addresses, VMA remain the same, LMA Addresses may change.
  • Resolve the filtered relocations

This adds considerable complexity though.

Not had a chance to go through the code and tests yet, been a very busy week. Will try and do that as soon as possible.

I am curious how Final VMA Addresses are determined. Doesn't relocations in an uncompressed section content affect the compressed section size?

I am curious how Final VMA Addresses are determined. Doesn't relocations in an uncompressed section content affect the compressed section size?

In armlink the assumption is that all VMA operates on uncompressed data. With the decompressor running very early in the startup sequence so all running code only sees uncompressed data. Compressed data is only in LMA.

I guess LLD we are making life hard by having user code do the decompression on demand rather than insist that everything is done at once by startup code

In armlink scatter file notation:

ER_RO 0x8000 {
# all read-only sections
  *(+ro)
}
ER_RW 0x10000000 {
# all read-write sections, implicitly marked for compression
  *(+rw)
}
ER_ZI +0 {
# zero initialized data follows (in VMA) after compressed RW
}

The ER_RW load size depends on compression, but its run-time size is always the uncompressed size.

At startup, the first routine sets up a stack, usually using the space reserved for ZI, then it calls the routine to decompress all the data from LMA to VMA. User code can't easily get at the compressed data after that time.