This change affects the non-linker script case (precisely, when the
`SECTIONS` command is not used). Let me demonstrate the idea with aIt deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
Let me demonstrate the idea with a maxPageSize=65536 example:
When assigning the address to the first output section of a new PT_LOAD,
if the end position of last section_vaddr of the previous PT_LOAD is 0x10020, we advance to
we advance to the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
will thus have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
congruent modulo maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
p_offset gap [0x10020, 0x20000) in the output.
Alternatively, if we advance the position to 0x20020, the new PT_LOAD
will have p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap.
At runtime, p_vaddr will be rounded down by pagesize
(0x2000065536 if pagesize=maxPageSize). This PT_LOAD will load initialadditional
initial contents from p_offset ranges [0x10000,0x10020), which will also be
be loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in
in effect or if we are not transiting between executable and non-executable
ld.bfd -z noseparate-code leverages this technique to keep output small. This can be
observed with its -z noseparate-code output.This patch implements the technique in lld, This patch implements thewhich is mostly effective on
technique in lld. Wtargets with ourlarge default -z noseparate-code,MaxPageSize (AArch64/MIPS/PPC: 65536). it removes 2The 3
removed alignment boundaries, as indicated by `|`:
`R | RX | RW(relro) RW(non-relro)`s can save almost 3*65536 bytes.
Two places that rely on p_vaddr%pagesize = 0 have to be updated.
This technique is mostly effective on targets with large1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults
defaultMaxPageSize (AArch64/MIPS/PPC: 65536 to 4096 on all targets). The two alignmentNow p_vaddr%commonPageSize may be non-zero.
boundaries can increase the file size by almost 2*65536 bytes.
A note about p_memsz The updated formula takes account of PT_GNU_RELRO: we used to round it up tothat factor.
commonPageSize (defaults to 4096 on all targets). Now2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0.
p_vaddr%commonPageSize may be non-zero so we need to take account of
that factor Fix them. See the updated comments in InputSection.cpp for details.
Fix TP offset computation for x86, PPC, RISC-V if p_vaddr%p_align != 0:
* x86: `st_value - p_memsz - (-p_vaddr-p_memsz & p_align-1)`
On targets that we enable the technique (only PPC64 now),
we can probably make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`
The end of PT_TLS (p_vaddr+p_memsz) rounded up to p_if `sh_addralign has TP offset 0.
* PPC: `st_value + p_v(.tdata) < sh_addr%p_align - 0x7000`(.tbss)`
p_vaddr rounded down to p_align has TP offset -0x7000.This exposes many problems in ld.so implementations, especially the
The first addresoffsets of PT_TLS (p_vaddr) has TP offset (p_vaddr%p_align - 0x7000).dynamic TLS blocks. Known issues:
* RISC-V: `st_value + p_vaddr%p_align`
Tests for x86 and RISC-V will follow in subsequent patches. The change FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64)
glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606
of static TLS block offsets is a no-op if p_vaddr%p_align = 0. musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)
Due to the large number of tests I have to fix (offset/address changes)
and potential risk, this technique is only enabled for PPC in this So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).
The technique will be enabled (with updated tests) for other targets in