Ported the D64906 technique to AArch64. It deletes 3 alignments at
PT_LOAD boundaries for the default case: the size of an aarch64 binary
can be decreased by at most 192kb.
If sh_addralign(.tdata) < sh_addralign(.tbss),
we can potentially make p_vaddr(PT_TLS)%p_align(PT_TLS) != 0.
ld.so that are known to have problems if p_vaddr%p_align!=0:
- musl<=1.1.22
- FreeBSD 13.0-CURRENT (and before) rtld-elf arm64
New test aarch64-tls-vaddr-align.s checks p_vaddr%p_align = 0.
The comment below is just an abbreviation of the code. Given that we have a more complex calculation to do than ld.bfd or ld.gold (IIUC they force the alignment of the first TLS section to p_align which means that the p_offset will be 0 mod p_align), I think we need a bit more explanation. Something like:
I also needed to reach for my operator precedence table more than once when looking at the expression. It may be worth splitting it up a bit, for example: