This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Simplify RelRo, TLS, NOBITS section ranks and make RW PT_LOAD start with RelRo
ClosedPublic

Authored by MaskRay on Jan 17 2019, 12:26 AM.

Details

Summary

Old: PT_LOAD(.data | PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) | .bss)
New: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) | .data .bss)

The placement of | indicates page alignment caused by PT_GNU_RELRO. The
new layout has simler rules and saves space for many cases.

Old size: roundup(.data) + roundup(.data.rel.ro)
New size: roundup(.data.rel.ro + .bss.rel.ro) + .data

Other advantages:

  • At runtime the 3 memory mappings decrease to 2.
  • start(PT_TLS) = start(PT_GNU_RELRO) = start(RW PT_LOAD). This simplifies binary manipulation tools. GNU strip before 2.31 discards PT_GNU_RELRO if its address is not equal to the start of its associated PT_LOAD. This has been fixed by https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f2731e0c374e5323ce4cdae2bcc7b7fe22da1a6f But with this change, we will be compatible with GNU strip before 2.31
  • Before, .got.plt (non-relro by default) was placed before .got (relro by default), which made it impossible to have _GLOBAL_OFFSET_TABLE_ (start of .got.plt on x86-64) equal to the end of .got (R_GOT*_FROM_END) (https://bugs.llvm.org/show_bug.cgi?id=36555). With the new ordering, we can improve on this regard if we'd like to.

Diff Detail

Repository
rL LLVM

Event Timeline

MaskRay created this revision.Jan 17 2019, 12:26 AM

This would require fixing 100+ tests because of address changes. I want to get some feedback first...

There are other cases that the patch hasn't (but should) cover: NOBITS sections can also be RELRO; I need to check ld.bfd/gold's behavior

ruiu added a comment.Jan 17 2019, 9:34 AM

Not sure why you want to implement a workaround at the moment. Can you elaborate? This is a bug in GNU binutils, and the fix was submitted almost a year ago. It looks like we should keep doing what we think right and don't implement too many workarounds.

MaskRay added a comment.EditedJan 17 2019, 6:14 PM

Not sure why you want to implement a workaround at the moment. Can you elaborate? This is a bug in GNU binutils, and the fix was submitted almost a year ago. It looks like we should keep doing what we think right and don't implement too many workarounds.

Fixing GNU strip<2.31 (see https://sourceware.org/bugzilla/show_bug.cgi?id=22829#c6) was my motivation to investigate this issue but it is only a secondary reason why I propose this patch (the workaround we get for GNU strip<2.31 is a byproduct). The primary reason is that it can decrease 1 memory mapping (if PT_GNU_RELRO is in the middle, the PT_LOAD splits into 3 memory mappings), save some address space, and make it possible to let PT_GNU_RELRO have the same p_memsz and p_filesz in a future change.

<del>I'll check other relro changes to see if this patch is reasonable and feasible.</del>

Since rLLD291523, an lld-linked executable can have the following section layout:

.data
.data.rel.ro
.bss.rel.ro
.bss

where .data.rel.ro and .bss.rel.ro are included in PT_GNU_RELRO. SHT_NOBITS .bss.rel.ro was invented around rLLD291523 to save space reserved for copy relocation (ld.bfd/gold use .data.rel.ro and do not save the space)

After ld.so remaps PT_GNU_RELRO, the single memory mapping splits into 3 mappings.

By changing to start(PT_GNU_RELRO) = start(PT_LOAD), we work around GNU strip<2.31 and probably some other binary post-processing tools.
However, the change is not incompatible with .bss.rel.ro .

I want to understand more about the pros and cons of this change to see if it is really favorale. CC pcc@ to get some insights from him.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2019, 7:30 PM
Herald added a subscriber: jdoerfert. · View Herald Transcript
ruiu added a comment.Mar 1 2019, 4:35 PM

Thank you for your explanation. I think that makes sense.

ELF/Writer.cpp
824–826 ↗(On Diff #182217)

I don't feel this comment explains the intention of the change clearly. IIUC, the intention of the change is to separate relro sections (e.g. .data.rel.ro and .bss.rel.ro) from non-relro sections (e.g. .data and .bss), so that we don't waste space for page alignment. An alternative ordering, e.g., .data, .data.rel.ro, .bss, .bss.rel.ro, wastes on average address space of 2 pages because each section would be page-aligned.

pcc added a comment.Mar 1 2019, 4:48 PM

So after this patch the layout will be

.data.rel.ro
.bss.rel.ro
.data
.bss

? That's valid I suppose, but I don't see how we avoid the extra mapping.

ruiu added a comment.Mar 1 2019, 4:53 PM
In D56828#1415833, @pcc wrote:

So after this patch the layout will be

.data.rel.ro
.bss.rel.ro
.data
.bss

? That's valid I suppose, but I don't see how we avoid the extra mapping.

You can at least save on average one page of memory address space, as the last page of .data.rel.ro and the first page of .bss.rel.ro can be on the same page?

pcc added a comment.Mar 1 2019, 5:09 PM

Well, with the old layout you could still place .data.rel.ro and .bss.rel.ro on the same page, but with the new one .data and .bss can now be placed together. That's probably where you get the largest benefit since they're more commonly used.

In terms of mappings I guess you save a mapping unless .bss.rel.ro is large enough to require another page, in which case you need another PT_LOAD (and another mapping).

MaskRay updated this revision to Diff 189046.Mar 2 2019, 5:06 AM
MaskRay retitled this revision from [ELF] Make RW PT_LOAD start with PT_GNU_RELRO sections to [ELF] Simplify RelRo, TLS, NOBITS section ranks and make RW PT_LOAD start with RelRo.
MaskRay edited the summary of this revision. (Show Details)
MaskRay removed a reviewer: pcc.
MaskRay removed a subscriber: jdoerfert.

Retitle

MaskRay added a reviewer: pcc.Mar 2 2019, 5:08 AM
joerg added a subscriber: joerg.Mar 2 2019, 10:19 AM

Placing .bss.rel.ro before .data doesn't make sense. It forces the content of .bss.rel.ro to embedded into the binary. I also don't really understand the motivation here. Memory mappins on the kernel side are quite cheap.

MaskRay added a comment.EditedMar 2 2019, 10:28 PM

Placing .bss.rel.ro before .data doesn't make sense. It forces the content of .bss.rel.ro to embedded into the binary. I also don't really understand the motivation here. Memory mappins on the kernel side are quite cheap.

.bss.rel.ro is a linker synthesized section whose only purpose is to reserve space for copy relocations of symbols in read-only segments. In most applications, we expect it is very small or does not exist at all.

I agree that in -z relro mode (default), when either .data or .bss exists (almost always true), the contents of .bss.rel.ro are embedded (as if it is .data.rel.ro; this is used by newer ld.bfd/gold). However, since .bss.rel.ro is small, this doesn't matter.

(We use .data.rel.ro as a representative for relro progbits sections like .dynamic .got)

old size: roundup(.data) + roundup(.data.rel.ro)
new size: roundup(.data.rel.ro + .bss.rel.ro) + .data
old-new = (page - .data%page)%page - roundup(.bss.rel.ro - (pagesize - .data.rel.ro%page)%page)

The first term is on average page/2 while the second term is (unbounded in theory), though 0 in most cases, pagesize in rare cases. The difference is usually positive => we decrease binary sizes.

So the advantage of this patch:

  • Simpler rule: we no longer need to place RelRo in the segment boundary.
  • start(PT_TLS) = start(PT_GNU_RELRO) = start(RW PT_LOAD): this simplifies some binary manipulation tools. As I noted, GNU strip<2.31 will not strip RelRo. It should also make llvm-objcopy's life easier. I don't know of a specific corner case but the split RW PT_LOAD may be bug-prone for tools.

Now In.BssRelRo serves as a marker (ld.bfd/gold call it dynrelro). In -z norelro mode, .bss.rel.ro naturally merges with .bss

Before, .got.plt (non-relro by default) is placed before .got (relro by default), which makes it impossible to have _GLOBAL_OFFSET_TABLE_ (start of .got.plt on x86-64) equal to end of .got (R_GOT*_FROM_END) (https://bugs.llvm.org/show_bug.cgi?id=36555): this is not a big deal though, as we've fixed tools to emit relocation types not relying the symbolic value of _GLOBAL_OFFSET_TABLE_. To the best of my knowledge it only affects glibc.

pcc added a comment.Mar 3 2019, 11:58 AM

I agree that in -z relro mode (default), when either .data or .bss exists (almost always true), the contents of .bss.rel.ro are embedded (as if it is .data.rel.ro; this is used by newer ld.bfd/gold). However, since .bss.rel.ro is small, this doesn't matter.

Not necessarily. As I mentioned in my earlier comment, you could have a second PT_LOAD for relro.

e.g. with the following layout:

.data.rel.ro (vaddr 0x3000 size 0x1000)
.bss.rel.ro (vaddr 0x4000 size 0x1000)
.data (vaddr 0x5000 size 0x1000)
.bss (vaddr 0x6000 size 0x1000)

You could lay out .data at file offset 0x4000 and then have the following program headers:

PT_LOAD rw p_offset = 0x3000 p_vaddr = 0x3000 p_filesz = 0x1000 p_memsz = 0x2000
PT_LOAD rw p_offset = 0x4000 p_vaddr = 0x5000 p_filesz = 0x1000 p_memsz = 0x2000
PT_GNU_RELRO p_offset = 0x3000 p_vaddr = 0x3000 p_filesz = 0x1000 p_memsz = 0x2000

I wouldn't expect .bss.rel.ro to always be small in the future. One thing that we could use it for is to compress the storage of relro sections which do not contain data, only relocations (a good example of this is most vtables in position-independent binaries). If a relro section's statically relocated data is all zeros, we could move it into .bss.rel.ro. Then the section itself would not take up space in the output file, only its relocations would.

MaskRay updated this revision to Diff 189119.Mar 3 2019, 10:53 PM
MaskRay edited the summary of this revision. (Show Details)
MaskRay added a subscriber: jdoerfert.

Update description

In D56828#1416503, @pcc wrote:

I agree that in -z relro mode (default), when either .data or .bss exists (almost always true), the contents of .bss.rel.ro are embedded (as if it is .data.rel.ro; this is used by newer ld.bfd/gold). However, since .bss.rel.ro is small, this doesn't matter.

Not necessarily. As I mentioned in my earlier comment, you could have a second PT_LOAD for relro.

e.g. with the following layout:

.data.rel.ro (vaddr 0x3000 size 0x1000)
.bss.rel.ro (vaddr 0x4000 size 0x1000)
.data (vaddr 0x5000 size 0x1000)
.bss (vaddr 0x6000 size 0x1000)

You could lay out .data at file offset 0x4000 and then have the following program headers:

PT_LOAD rw p_offset = 0x3000 p_vaddr = 0x3000 p_filesz = 0x1000 p_memsz = 0x2000
PT_LOAD rw p_offset = 0x4000 p_vaddr = 0x5000 p_filesz = 0x1000 p_memsz = 0x2000
PT_GNU_RELRO p_offset = 0x3000 p_vaddr = 0x3000 p_filesz = 0x1000 p_memsz = 0x2000

I wouldn't expect .bss.rel.ro to always be small in the future. One thing that we could use it for is to compress the storage of relro sections which do not contain data, only relocations (a good example of this is most vtables in position-independent binaries). If a relro section's statically relocated data is all zeros, we could move it into .bss.rel.ro. Then the section itself would not take up space in the output file, only its relocations would.

Thanks for the suggestion and pointing out the improvement direction of .bss.rel.ro ("If a relro section's statically relocated data is all zeros, we could move it into .bss.rel.ro")! The idea is illustrated below:

Old: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) .data .bss)
New: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro)) PT_LOAD(.data. .bss)

I've created https://reviews.llvm.org/D58892 for the idea. This patch alone requires 116 tests. Do you favor the new layout?

If yes, once this is accepted, I can work on D58892 and fix another 9 tests.

MaskRay marked an inline comment as done.Mar 4 2019, 4:59 PM

Ping :)

This is simpler than it appeared to be..

pcc added a comment.Mar 13 2019, 5:10 PM

Looks good, please go ahead and modify the tests.

MaskRay updated this revision to Diff 190555.Mar 13 2019, 6:45 PM

Add tests

Looks good, please go ahead and modify the tests.

I'll take that as approval as in offline discussion ruiu said he didn't have objection and was waiting on your decision :)

This revision was not accepted when it landed; it landed in state Needs Review.Mar 13 2019, 8:47 PM
This revision was automatically updated to reflect the committed changes.
pcc added a comment.Mar 14 2019, 7:10 PM

Looks good, please go ahead and modify the tests.

I'll take that as approval as in offline discussion ruiu said he didn't have objection and was waiting on your decision :)

I intended to review your test changes first. I have now done so.

lld/trunk/test/ELF/arm-plt-reloc.s
53

Same here.

lld/trunk/test/ELF/arm-thumb-plt-reloc.s
63

The calculations in this file need to be updated.

lld/trunk/test/ELF/pack-dyn-relocs-loop.s
12

Can you check that this test fails if you revert the change that added the test?

MaskRay marked 4 inline comments as done.Mar 14 2019, 8:20 PM

The three tests are fixed in https://reviews.llvm.org/rLLD356229

lld/trunk/test/ELF/pack-dyn-relocs-loop.s
12

Done.

Have to use -z norelro to counteract layout change.