D58892 splits the RW PT_LOAD on the PT_GNU_RELRO boundary. The new
PT_LOAD triggers:
if (p->p_type == PT_LOAD && p->firstSec) pageAlign(p->firstSec);
which makes the pageAlign at PT_GNU_RELRO boundaries redundant.
Differential D64854
[ELF] Delete redundant pageAlign at PT_GNU_RELRO boundaries after D58892 Authored by MaskRay on Jul 17 2019, 12:55 AM.
Details
D58892 splits the RW PT_LOAD on the PT_GNU_RELRO boundary. The new if (p->p_type == PT_LOAD && p->firstSec) pageAlign(p->firstSec); which makes the pageAlign at PT_GNU_RELRO boundaries redundant.
Diff Detail
Event TimelineComment Actions Currently we align p_vaddr to the next multiple of max-page-size, instead of ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) as ld.bfd does in its -z noseparate-code mode and some cases in its -z separate-code mode. See the comment below for a ld.lld -z max-page-size=0x200000 case: [11] .rodata PROGBITS 0000000000000558 000558 000004 04 AM 0 0 4
[12] .eh_frame_hdr PROGBITS 000000000000055c 00055c 00002c 00 A 0 0 4
[13] .eh_frame PROGBITS 0000000000000588 000588 0000cc 00 A 0 0 8
////// gap due to separated R-- and R-X
///// This gap can be saved in --no-rosegment mode.
[14] .text PROGBITS 0000000000200000 200000 000161 00 AX 0 0 16
[15] .init PROGBITS 0000000000200164 200164 000017 00 AX 0 0 4
[16] .fini PROGBITS 000000000020017c 20017c 000009 00 AX 0 0 4
[17] .plt PROGBITS 0000000000200190 200190 000020 00 AX 0 0 16
[18] .fini_array FINI_ARRAY 0000000000400000 400000 000008 08 WA 0 0 8
[19] .init_array INIT_ARRAY 0000000000400008 400008 000008 08 WA 0 0 8
[20] .dynamic DYNAMIC 0000000000400010 400010 0001a0 10 WA 8 0 8
[21] .got PROGBITS 00000000004001b0 4001b0 000028 00 WA 0 0 8
[22] .bss.rel.ro NOBITS 00000000004001d8 4001d8 000000 00 WA 0 0 1
/// Gap due to PT_GNU_RELRO. It wasts almost 0x200000 bytes.
/// If we change p_vaddr of the RW PT_LOAD from 0x600000 to 0x6001d8, its p_offset doesn't need to be aligned, and we can save nearly 0x200000 bytes in the file.
[23] .data PROGBITS 0000000000600000 600000 000010 00 WA 0 0 8
[24] .tm_clone_table PROGBITS 0000000000600010 600010 000000 00 WA 0 0 8
[25] .got.plt PROGBITS 0000000000600010 600010 000020 00 WA 0 0 8
[26] .bss NOBITS 0000000000600030 600030 000001 00 WA 0 0 1We perform several max-page-size alignments for this file and each costs nearly 0x200000 bytes. If we do the optimization as described in the comment, we can save a lot of disk space. This is particularly relevant to targets with a large defaultMaxPageSize (AArch64, MIPS (@atanasyan), and PPC (@sfertile): 65536). What do you think of this trick? Comment Actions Could you explain what ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) means? It looks like it aligns to the next multiple of MAXPAGESIZE (plus (. & (CONSTANT (MAXPAGESIZE) - 1))), so doesn't it consume a MAXPAGESIZE? Comment Actions Say, the end address of the last segment is 0x4001d8. The PT_LOAD covers the address range [0x400000, 0x4001d8) (at runtime, the end address will rounded up). Currently we set the address of the new segment to the next multiple of max-page-size: 0x600000. Due to the rule: p_offset/p_vaddr must be equal modulo maxpagesize, we have to set its p_offset to 0x600000. However, if we set the address of the new segment to 0x6001d8, its p_offset can be kept as 0x4001d8. This saves 0x600000-0x4001d8 bytes. The PT_LOAD may cover [0x6001d8, 0x6002d8). At runtime, it will become [0x600000, 0x601000) if the actual page size is 0x1000. The address range [0x600000, 0x6002d8) (with file offsets: [0x400000,0x4002d8)) is shared with the previous PT_LOAD segment. In the BFD -z separate-code case (similar to our default case), they don't want code to be shared with adjacent PT_LOAD segments. So it is probably not desired to apply this trick to R-X. If, however, --no-segment is specified, we can apply the trick to R-X. Comment Actions Thanks for the explanation. That optimization makes sense itself, but I'm not sure how often you want to use --no-rosegment. Is that frequently used? Comment Actions I think BFD uses this at least once for all programs. As I understand it BFD with its builtin linkerscript uses something like: .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) }
/* These sections are generated by the Sun/Oracle C++ compiler. */
.exception_ranges : ONLY_IF_RO { *(.exception_ranges*) }
/* Adjust the address for the data segment. We want to adjust up to
the same address within the page on the next page up. */
. = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE));
/* Exception handling */
.eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) }
.gnu_extab : ONLY_IF_RW { *(.gnu_extab) }
.gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) }
.exception_ranges : ONLY_IF_RW { *(.exception_ranges*) }
/* Thread Local Storage sections */
.tdata :
{
PROVIDE_HIDDEN (__tdata_start = .);
*(.tdata .tdata.* .gnu.linkonce.td.*)
}The DATA_SEGMENT_ALIGN function is defined in https://sourceware.org/binutils/docs/ld/Builtin-Functions.html#Builtin-Functions as doing either: (ALIGN(maxpagesize) + (. & (maxpagesize - 1))) or (ALIGN(maxpagesize) Our implementation of DATA_SEGMENT_ALIGN is just . = ALIGN(maxpagesize); In the past the reason given for not using these tricks has been keeping it simple, which is understandable when getting linker scripts to work at all. Now that we have a more stable base and more tests I'm in favour of introducing more of the optimisations. |