This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Delete redundant pageAlign at PT_GNU_RELRO boundaries after D58892
ClosedPublic

Authored by MaskRay on Jul 17 2019, 12:55 AM.

Details

Summary

D58892 splits the RW PT_LOAD on the PT_GNU_RELRO boundary. The new
PT_LOAD triggers:

if (p->p_type == PT_LOAD && p->firstSec)
  pageAlign(p->firstSec);

which makes the pageAlign at PT_GNU_RELRO boundaries redundant.

Diff Detail

Repository
rL LLVM

Event Timeline

MaskRay created this revision.Jul 17 2019, 12:55 AM

Currently we align p_vaddr to the next multiple of max-page-size, instead of ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) as ld.bfd does in its -z noseparate-code mode and some cases in its -z separate-code mode. See the comment below for a ld.lld -z max-page-size=0x200000 case:

  [11] .rodata           PROGBITS        0000000000000558 000558 000004 04  AM  0   0  4                        
  [12] .eh_frame_hdr     PROGBITS        000000000000055c 00055c 00002c 00   A  0   0  4                        
  [13] .eh_frame         PROGBITS        0000000000000588 000588 0000cc 00   A  0   0  8             
////// gap due to separated R-- and R-X   
///// This gap can be saved in --no-rosegment mode.   
  [14] .text             PROGBITS        0000000000200000 200000 000161 00  AX  0   0 16                       
  [15] .init             PROGBITS        0000000000200164 200164 000017 00  AX  0   0  4                       
  [16] .fini             PROGBITS        000000000020017c 20017c 000009 00  AX  0   0  4                       
  [17] .plt              PROGBITS        0000000000200190 200190 000020 00  AX  0   0 16 
                      
  [18] .fini_array       FINI_ARRAY      0000000000400000 400000 000008 08  WA  0   0  8                       
  [19] .init_array       INIT_ARRAY      0000000000400008 400008 000008 08  WA  0   0  8                       
  [20] .dynamic          DYNAMIC         0000000000400010 400010 0001a0 10  WA  8   0  8                       
  [21] .got              PROGBITS        00000000004001b0 4001b0 000028 00  WA  0   0  8                       
  [22] .bss.rel.ro       NOBITS          00000000004001d8 4001d8 000000 00  WA  0   0  1         
/// Gap due to PT_GNU_RELRO. It wasts almost 0x200000 bytes.
/// If we change p_vaddr of the RW PT_LOAD from 0x600000 to 0x6001d8, its p_offset doesn't need to be aligned, and we can save nearly 0x200000 bytes in the file.
  [23] .data             PROGBITS        0000000000600000 600000 000010 00  WA  0   0  8                       
  [24] .tm_clone_table   PROGBITS        0000000000600010 600010 000000 00  WA  0   0  8                       
  [25] .got.plt          PROGBITS        0000000000600010 600010 000020 00  WA  0   0  8                       
  [26] .bss              NOBITS          0000000000600030 600030 000001 00  WA  0   0  1

We perform several max-page-size alignments for this file and each costs nearly 0x200000 bytes. If we do the optimization as described in the comment, we can save a lot of disk space. This is particularly relevant to targets with a large defaultMaxPageSize (AArch64, MIPS (@atanasyan), and PPC (@sfertile): 65536).

What do you think of this trick?

ruiu added a comment.Jul 17 2019, 1:23 AM

Could you explain what ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) means? It looks like it aligns to the next multiple of MAXPAGESIZE (plus (. & (CONSTANT (MAXPAGESIZE) - 1))), so doesn't it consume a MAXPAGESIZE?

MaskRay added a comment.EditedJul 17 2019, 1:36 AM

Could you explain what ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) means? It looks like it aligns to the next multiple of MAXPAGESIZE (plus (. & (CONSTANT (MAXPAGESIZE) - 1))), so doesn't it consume a MAXPAGESIZE?

Say, the end address of the last segment is 0x4001d8. The PT_LOAD covers the address range [0x400000, 0x4001d8) (at runtime, the end address will rounded up). Currently we set the address of the new segment to the next multiple of max-page-size: 0x600000. Due to the rule: p_offset/p_vaddr must be equal modulo maxpagesize, we have to set its p_offset to 0x600000.

However, if we set the address of the new segment to 0x6001d8, its p_offset can be kept as 0x4001d8. This saves 0x600000-0x4001d8 bytes. The PT_LOAD may cover [0x6001d8, 0x6002d8). At runtime, it will become [0x600000, 0x601000) if the actual page size is 0x1000. The address range [0x600000, 0x6002d8) (with file offsets: [0x400000,0x4002d8)) is shared with the previous PT_LOAD segment.

In the BFD -z separate-code case (similar to our default case), they don't want code to be shared with adjacent PT_LOAD segments. So it is probably not desired to apply this trick to R-X. If, however, --no-segment is specified, we can apply the trick to R-X.

ruiu added a comment.Jul 17 2019, 1:51 AM

Could you explain what ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) means? It looks like it aligns to the next multiple of MAXPAGESIZE (plus (. & (CONSTANT (MAXPAGESIZE) - 1))), so doesn't it consume a MAXPAGESIZE?

Say, the end address of the last segment is 0x4001d8. The PT_LOAD covers the address range [0x400000, 0x4001d8) (at runtime, the end address will rounded up). Currently we set the address of the new segment to the next multiple of max-page-size: 0x600000. Due to the rule: p_offset/p_vaddr must be equal modulo maxpagesize, we have to set its p_offset to 0x600000.

However, if we set the address of the new segment to 0x6001d8, its p_offset can be kept as 0x4001d8. This saves 0x600000-0x4001d8 bytes. The PT_LOAD may cover [0x6001d8, 0x6002d8). At runtime, it will become [0x600000, 0x601000) if the actual page size is 0x1000. The address range [0x600000, 0x6002d8) (with file offsets: [0x400000,0x4002d8)) is shared with the previous PT_LOAD segment.

In the BFD -z separate-code case (similar to our default case), they don't want code to be shared with adjacent PT_LOAD segments. So it is probably not desired to apply this trick to R-X. If, however, --no-segment is specified, we can apply the trick to R-X.

Thanks for the explanation. That optimization makes sense itself, but I'm not sure how often you want to use --no-rosegment. Is that frequently used?

MaskRay updated this revision to Diff 210271.Jul 17 2019, 1:57 AM
MaskRay retitled this revision from [ELF] Delete redundant pageAlign of the first section after PT_GNU_RELRO after D58892 to [ELF] Delete redundant pageAlign at PT_GNU_RELRO boundaries after D58892.
MaskRay edited the summary of this revision. (Show Details)
MaskRay removed subscribers: atanasyan, sfertile.

Delete the whole PT_GNU_RELRO block

ruiu accepted this revision.Jul 17 2019, 2:16 AM

Code LGTM by the way, feel free to submit.

This revision is now accepted and ready to land.Jul 17 2019, 2:16 AM
This revision was automatically updated to reflect the committed changes.

Currently we align p_vaddr to the next multiple of max-page-size, instead of ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) as ld.bfd does in its -z noseparate-code mode and some cases in its -z separate-code mode. See the comment below for a ld.lld -z max-page-size=0x200000 case:

  [11] .rodata           PROGBITS        0000000000000558 000558 000004 04  AM  0   0  4                        
  [12] .eh_frame_hdr     PROGBITS        000000000000055c 00055c 00002c 00   A  0   0  4                        
  [13] .eh_frame         PROGBITS        0000000000000588 000588 0000cc 00   A  0   0  8             
////// gap due to separated R-- and R-X   
///// This gap can be saved in --no-rosegment mode.   
  [14] .text             PROGBITS        0000000000200000 200000 000161 00  AX  0   0 16                       
  [15] .init             PROGBITS        0000000000200164 200164 000017 00  AX  0   0  4                       
  [16] .fini             PROGBITS        000000000020017c 20017c 000009 00  AX  0   0  4                       
  [17] .plt              PROGBITS        0000000000200190 200190 000020 00  AX  0   0 16 
                      
  [18] .fini_array       FINI_ARRAY      0000000000400000 400000 000008 08  WA  0   0  8                       
  [19] .init_array       INIT_ARRAY      0000000000400008 400008 000008 08  WA  0   0  8                       
  [20] .dynamic          DYNAMIC         0000000000400010 400010 0001a0 10  WA  8   0  8                       
  [21] .got              PROGBITS        00000000004001b0 4001b0 000028 00  WA  0   0  8                       
  [22] .bss.rel.ro       NOBITS          00000000004001d8 4001d8 000000 00  WA  0   0  1         
/// Gap due to PT_GNU_RELRO. It wasts almost 0x200000 bytes.
/// If we change p_vaddr of the RW PT_LOAD from 0x600000 to 0x6001d8, its p_offset doesn't need to be aligned, and we can save nearly 0x200000 bytes in the file.
  [23] .data             PROGBITS        0000000000600000 600000 000010 00  WA  0   0  8                       
  [24] .tm_clone_table   PROGBITS        0000000000600010 600010 000000 00  WA  0   0  8                       
  [25] .got.plt          PROGBITS        0000000000600010 600010 000020 00  WA  0   0  8                       
  [26] .bss              NOBITS          0000000000600030 600030 000001 00  WA  0   0  1

We perform several max-page-size alignments for this file and each costs nearly 0x200000 bytes. If we do the optimization as described in the comment, we can save a lot of disk space. This is particularly relevant to targets with a large defaultMaxPageSize (AArch64, MIPS (@atanasyan), and PPC (@sfertile): 65536).

What do you think of this trick?

I think BFD uses this at least once for all programs. As I understand it BFD with its builtin linkerscript uses something like:

.gnu_extab   : ONLY_IF_RO { *(.gnu_extab*) }
/* These sections are generated by the Sun/Oracle C++ compiler.  */
.exception_ranges   : ONLY_IF_RO { *(.exception_ranges*) }
/* Adjust the address for the data segment.  We want to adjust up to
   the same address within the page on the next page up.  */
. = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE));
/* Exception handling  */
.eh_frame       : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) }
.gnu_extab      : ONLY_IF_RW { *(.gnu_extab) }
.gcc_except_table   : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) }
.exception_ranges   : ONLY_IF_RW { *(.exception_ranges*) }
/* Thread Local Storage sections  */
.tdata          :
 {
   PROVIDE_HIDDEN (__tdata_start = .);
   *(.tdata .tdata.* .gnu.linkonce.td.*)
 }

The DATA_SEGMENT_ALIGN function is defined in https://sourceware.org/binutils/docs/ld/Builtin-Functions.html#Builtin-Functions as doing either: (ALIGN(maxpagesize) + (. & (maxpagesize - 1))) or (ALIGN(maxpagesize)
+ ((. + commonpagesize - 1) & (maxpagesize - commonpagesize))) depending on whichever uses fewer commonpage sizes for the data segment.

Our implementation of DATA_SEGMENT_ALIGN is just

. = ALIGN(maxpagesize);

In the past the reason given for not using these tricks has been keeping it simple, which is understandable when getting linker scripts to work at all. Now that we have a more stable base and more tests I'm in favour of introducing more of the optimisations.