D58892 splits the RW PT_LOAD on the PT_GNU_RELRO boundary. The new
PT_LOAD triggers:
if (p->p_type == PT_LOAD && p->firstSec) pageAlign(p->firstSec);
which makes the pageAlign at PT_GNU_RELRO boundaries redundant.
Differential D64854
[ELF] Delete redundant pageAlign at PT_GNU_RELRO boundaries after D58892 MaskRay on Jul 17 2019, 12:55 AM. Authored by
Details
D58892 splits the RW PT_LOAD on the PT_GNU_RELRO boundary. The new if (p->p_type == PT_LOAD && p->firstSec) pageAlign(p->firstSec); which makes the pageAlign at PT_GNU_RELRO boundaries redundant.
Diff Detail
Event TimelineComment Actions Currently we align p_vaddr to the next multiple of max-page-size, instead of ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) as ld.bfd does in its -z noseparate-code mode and some cases in its -z separate-code mode. See the comment below for a ld.lld -z max-page-size=0x200000 case: [11] .rodata PROGBITS 0000000000000558 000558 000004 04 AM 0 0 4 [12] .eh_frame_hdr PROGBITS 000000000000055c 00055c 00002c 00 A 0 0 4 [13] .eh_frame PROGBITS 0000000000000588 000588 0000cc 00 A 0 0 8 ////// gap due to separated R-- and R-X ///// This gap can be saved in --no-rosegment mode. [14] .text PROGBITS 0000000000200000 200000 000161 00 AX 0 0 16 [15] .init PROGBITS 0000000000200164 200164 000017 00 AX 0 0 4 [16] .fini PROGBITS 000000000020017c 20017c 000009 00 AX 0 0 4 [17] .plt PROGBITS 0000000000200190 200190 000020 00 AX 0 0 16 [18] .fini_array FINI_ARRAY 0000000000400000 400000 000008 08 WA 0 0 8 [19] .init_array INIT_ARRAY 0000000000400008 400008 000008 08 WA 0 0 8 [20] .dynamic DYNAMIC 0000000000400010 400010 0001a0 10 WA 8 0 8 [21] .got PROGBITS 00000000004001b0 4001b0 000028 00 WA 0 0 8 [22] .bss.rel.ro NOBITS 00000000004001d8 4001d8 000000 00 WA 0 0 1 /// Gap due to PT_GNU_RELRO. It wasts almost 0x200000 bytes. /// If we change p_vaddr of the RW PT_LOAD from 0x600000 to 0x6001d8, its p_offset doesn't need to be aligned, and we can save nearly 0x200000 bytes in the file. [23] .data PROGBITS 0000000000600000 600000 000010 00 WA 0 0 8 [24] .tm_clone_table PROGBITS 0000000000600010 600010 000000 00 WA 0 0 8 [25] .got.plt PROGBITS 0000000000600010 600010 000020 00 WA 0 0 8 [26] .bss NOBITS 0000000000600030 600030 000001 00 WA 0 0 1 We perform several max-page-size alignments for this file and each costs nearly 0x200000 bytes. If we do the optimization as described in the comment, we can save a lot of disk space. This is particularly relevant to targets with a large defaultMaxPageSize (AArch64, MIPS (@atanasyan), and PPC (@sfertile): 65536). What do you think of this trick? Comment Actions Could you explain what ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) means? It looks like it aligns to the next multiple of MAXPAGESIZE (plus (. & (CONSTANT (MAXPAGESIZE) - 1))), so doesn't it consume a MAXPAGESIZE? Comment Actions Say, the end address of the last segment is 0x4001d8. The PT_LOAD covers the address range [0x400000, 0x4001d8) (at runtime, the end address will rounded up). Currently we set the address of the new segment to the next multiple of max-page-size: 0x600000. Due to the rule: p_offset/p_vaddr must be equal modulo maxpagesize, we have to set its p_offset to 0x600000. However, if we set the address of the new segment to 0x6001d8, its p_offset can be kept as 0x4001d8. This saves 0x600000-0x4001d8 bytes. The PT_LOAD may cover [0x6001d8, 0x6002d8). At runtime, it will become [0x600000, 0x601000) if the actual page size is 0x1000. The address range [0x600000, 0x6002d8) (with file offsets: [0x400000,0x4002d8)) is shared with the previous PT_LOAD segment. In the BFD -z separate-code case (similar to our default case), they don't want code to be shared with adjacent PT_LOAD segments. So it is probably not desired to apply this trick to R-X. If, however, --no-segment is specified, we can apply the trick to R-X. Comment Actions Thanks for the explanation. That optimization makes sense itself, but I'm not sure how often you want to use --no-rosegment. Is that frequently used? Comment Actions I think BFD uses this at least once for all programs. As I understand it BFD with its builtin linkerscript uses something like: .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } /* These sections are generated by the Sun/Oracle C++ compiler. */ .exception_ranges : ONLY_IF_RO { *(.exception_ranges*) } /* Adjust the address for the data segment. We want to adjust up to the same address within the page on the next page up. */ . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); /* Exception handling */ .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } .exception_ranges : ONLY_IF_RW { *(.exception_ranges*) } /* Thread Local Storage sections */ .tdata : { PROVIDE_HIDDEN (__tdata_start = .); *(.tdata .tdata.* .gnu.linkonce.td.*) } The DATA_SEGMENT_ALIGN function is defined in https://sourceware.org/binutils/docs/ld/Builtin-Functions.html#Builtin-Functions as doing either: (ALIGN(maxpagesize) + (. & (maxpagesize - 1))) or (ALIGN(maxpagesize) Our implementation of DATA_SEGMENT_ALIGN is just . = ALIGN(maxpagesize); In the past the reason given for not using these tricks has been keeping it simple, which is understandable when getting linker scripts to work at all. Now that we have a more stable base and more tests I'm in favour of introducing more of the optimisations. |