Page MenuHomePhabricator

[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges
ClosedPublic

Authored by MaskRay on Jul 18 2019, 12:51 AM.

Details

Summary

This change affects the non-linker script case (precisely, when the
SECTIONS command is not used). It deletes 3 alignments at PT_LOAD
boundaries for the default case: the size of a powerpc64 binary can be
decreased by at most 192kb. The technique can be ported to other
targets.

Let me demonstrate the idea with a maxPageSize=65536 example:

When assigning the address to the first output section of a new PT_LOAD,
if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to
the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus
have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo
maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020,
0x20000) in the output.

Alternatively, if we advance the position to 0x20020, the new PT_LOAD
will have p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset!
Obviously 0x10020 is the choice because it leaves no gap.
At runtime, p_vaddr will be rounded down by pagesize
(65536 if pagesize=maxPageSize). This PT_LOAD will load additional
initial contents from p_offset ranges [0x10000,0x10020), which will also
be loaded by the previous PT_LOAD. This is fine if -z noseparate-code is
in effect or if we are not transiting between executable and
non-executable segments.

ld.bfd -z noseparate-code leverages this technique to keep output small.
This patch implements the technique in lld, which is mostly effective on
targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3
removed alignments can save almost 3*65536 bytes.

Two places that rely on p_vaddr%pagesize = 0 have to be updated.

  1. We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero. The updated formula takes account of that factor.
  2. Our TP offsets formulae are only correct if p_vaddr%p_align = 0. Fix them. See the updated comments in InputSection.cpp for details.

    On targets that we enable the technique (only PPC64 now), we can potentially make p_vaddr(PT_TLS)%p_align(PT_TLS) != 0 if sh_addralign(.tdata) < sh_addralign(.tbss)

    This exposes many problems in ld.so implementations, especially the offsets of dynamic TLS blocks. Known issues:

    FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64) glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606 musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...)

    So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS).

The technique will be enabled (with updated tests) for other targets in
subsequent patches.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
MaskRay updated this revision to Diff 210956.Jul 20 2019, 5:28 AM
MaskRay edited the summary of this revision. (Show Details)

Align the RW PT_LOAD which includes PT_TLS, and add ppc64-tls-vaddr-align.s

A stage2 check-llvm check-clang check-lld passed before this revision. But
when I test this patch (the x86_64 version) in our internal code base, I
noticed PT_TLS may not satisfy p_vaddr%p_align=0. This update fixes the issue.

MaskRay updated this revision to Diff 210964.Jul 20 2019, 10:04 AM
MaskRay edited the summary of this revision. (Show Details)

Fix TP offset computation if p_vaddr % p_align != 0

MaskRay updated this revision to Diff 211021.Jul 21 2019, 10:51 PM

Fix gcc 8 -Wparentheses

Passed stage 2 check-llvm check-clang check-lld on a powerpc64le machine

MaskRay updated this revision to Diff 211406.Jul 23 2019, 8:41 PM
MaskRay edited the summary of this revision. (Show Details)
MaskRay added reviewers: dalias, nsz.

Comment about static TLS blocks Variants 1 and 2.

MaskRay marked an inline comment as done.Jul 23 2019, 8:44 PM
MaskRay added inline comments.
ELF/InputSection.cpp
611

@peter.smith Moved the comment here because this mostly applies on other Variant I targets (PPC,RISC-V,...).

MaskRay updated this revision to Diff 211410.Jul 23 2019, 9:22 PM

Reword the comment about TLS

MaskRay updated this revision to Diff 212493.Jul 30 2019, 7:47 PM

Add a p_vaddr%p_align = 0 hack to work around some ld.so bugs

MaskRay updated this revision to Diff 212494.Jul 30 2019, 8:02 PM

Edit a comment: FreeBSD rtld-elf amd64 has the same bug as glibc(i386 x86-64) https://sourceware.org/bugzilla/show_bug.cgi?id=24606

D64903 is committed. This patch is ready now. It can delete 3 alignments at PT_LOAD boundaries (because -z noseparate-code is the default): it can decrease the size of a powerpc64 executable/shared object by at most 192kb (it can save more than 96kb on average).

I'm happy to go ahead as this is a pre-requisite for supporting AArch64.

MaskRay updated this revision to Diff 212775.Aug 1 2019, 3:39 AM
MaskRay edited the summary of this revision. (Show Details)

Update description to mention ld.so bugs (why we have to add a workaround)

MaskRay edited the summary of this revision. (Show Details)Aug 1 2019, 3:42 AM
MaskRay updated this revision to Diff 212779.Aug 1 2019, 4:29 AM

Move AArch64 TLS formula here

🥳It would be nice to get this in. Recently I learned that Brandon Bergren at FreeBSD used a local patch to fix the powerpc64 size regression caused by lld:

  // We need 64K pages (at least under glibc/Linux, the loader won't
  // set different permissions on a finer granularity than that).
-  defaultMaxPageSize = 65536;
+ defaultMaxPageSize = 4096;

(It works for them because they use 4k pagesize, but a general fix (this patch) will be preferable.) D64906 is also a prerequisite for aarch64 support, which many more people may want.

It turns out I need the proper fix after all in my local tree, because lately I have been working on getting a working Petitboot loader binary, and that means I'm technically cross compiling code for ppc64le Linux. So yeah, it would be very nice to get this in.

ruiu accepted this revision.Aug 20 2019, 12:48 AM

LGTM

This revision is now accepted and ready to land.Aug 20 2019, 12:48 AM
MaskRay updated this revision to Diff 216062.Aug 20 2019, 1:17 AM

Rebase. 3 ppc tests have changed recently.

troyj added a subscriber: troyj.Oct 16 2019, 1:39 PM
p->p_memsz = alignTo(p->p_offset + p->p_memsz, config->commonPageSize) - p->p_offset;

I think the whole rounding step is questionable, not simply this change to it. As far as I can tell from researching this, the rounding down that occurs is for the starting address to place RELRO on a page boundary. The size of RELRO does not get rounded down, so rounding it up here by any amount risks making more data read-only than is necessary, which can lead to seg faults.

p->p_memsz = alignTo(p->p_offset + p->p_memsz, config->commonPageSize) - p->p_offset;

I think the whole rounding step is questionable, not simply this change to it. As far as I can tell from researching this, the rounding down that occurs is for the starting address to place RELRO on a page boundary. The size of RELRO does not get rounded down, so rounding it up here by any amount risks making more data read-only than is necessary, which can lead to seg faults.

p->p_memsz = alignTo(p->p_offset + p->p_memsz, config->commonPageSize) -
             p->p_offset;

is necessary. In GNU ld, the last page of RELRO may not be protected as documented. To make that page protected on all of glibc/musl/FreeBSD libc, the change like https://reviews.llvm.org/D28267 is needed. Please also read https://reviews.llvm.org/D64906#1592854

If you cannot use -z norelro, you may try -z separate-code or -z separate-loadable-segments (D67481)

troyj added a comment.Oct 17 2019, 5:57 AM

Respectfully, I've read all of that plus https://www.airs.com/blog/archives/189, and we've arrived at different conclusions. I'm fine with maintaining a local patch; I just wanted to point it out in case it was useful to others upstream.

Respectfully, I've read all of that plus https://www.airs.com/blog/archives/189, and we've arrived at different conclusions. I'm fine with maintaining a local patch; I just wanted to point it out in case it was useful to others upstream.

Can you be more specific about how this conflicts with the blog post? Out of curiosity I want to learn why your software needs a local patch. I am 90% certain that that specific software makes unfounded assumption about the section/segment layout. GNU ld places PT_GNU_RELRO starting at a maxpagesize boundary and ending at a commonpagesize boundary. The last page may be unprotected. I agree that https://reviews.llvm.org/D29242 is not very ideal but it does not matter in practice: if runtime pagesize is smaller than commonpagesize(4096 on all targets but SPARCV9 that are supported by lld; 4096 on most targets supported by GNU ld), it can segfault.

There some some other differences, e.g. lld has .bss.rel.ro when no SECTIONS command is used. This was motivated by An Evil Copy: How the Loader Betrays You (I really dislike the misleading and exaggerated title) GNU ld and gold haven't implemented this. 2 RW schemes play well with PT_GNU_RELRO. etc

Can you be more specific about how this conflicts with the blog post?

The blog post says "Note that the current dynamic linker code will only work correctly if the PT_GNU_RELRO segment starts on a page boundary. This is because the dynamic linker rounds the p_vaddr field down to the previous page boundary." The lld code comment says "musl/glibc ld.so rounds the size down" and then proceeds to round the size up in what appears to be a countermeasure. So the blog post is talking about the starting address of the segment, but the lld code is rounding the size, not the starting address. For programs that I have linked, the starting address appears to already be on a page boundary, so no rounding is required there, and apply any rounding to the size results in an error because too much of the RW data segment gets marked RO.

Out of curiosity I want to learn why your software needs a local patch. I am 90% certain that that specific software makes unfounded assumption about the section/segment layout.

Yes, it makes assumptions that may be unfounded but happen to match ld.bfd and ld.gold behavior. Specifically, it assumes that the linker creates a single RW segment like ld.bfd and ld.gold, and that the starting address of that RW segment matches the starting address of the RELRO segment. I'm not aware of any rule that ld.lld violates by creating more than one RW segment, but it is inconvenient that it does not match the behavior of the other linkers. My local patch stops splitting the RW segment so that there is only one segment and then removes the rounding of the RELRO size. IF I leave the rounding in place, then the entire RW segment ends up being covered by the RELRO, which is bad because then the program can't write to any of its data. With the rounding removed, the emitted layout is much closer to the one emitted by ld.bfd and ld.gold.

The last page may be unprotected.

The last page being unprotected is better than incorrectly making the first page of writable data be RELRO. The former may miss identifying some programming errors or possibly open a security hole, but the latter certainly leads to the program crashing.

A better way might be to nudge the start of the writable data to begin later in the RW segment so that an integral number of initial pages can be RELRO, but ld.lld doesn't seem to do that and I'm not familiar with the code enough to add that myself. Hence, I'm settling for possibly not protecting all of the RO data until I know of a way to do the above.

MaskRay added a comment.EditedOct 21 2019, 6:11 PM

Can you be more specific about how this conflicts with the blog post?

The blog post says "Note that the current dynamic linker code will only work correctly if the PT_GNU_RELRO segment starts on a page boundary. This is because the dynamic linker rounds the p_vaddr field down to the previous page boundary." The lld code comment says "musl/glibc ld.so rounds the size down" and then proceeds to round the size up in what appears to be a countermeasure. So the blog post is talking about the starting address of the segment, but the lld code is rounding the size, not the starting address. For programs that I have linked, the starting address appears to already be on a page boundary, so no rounding is required there, and apply any rounding to the size results in an error because too much of the RW data segment gets marked RO.

It is best not to interpret this as a doctrine. musl does mprotect(laddr(p, p->relro_start), p->relro_end-p->relro_start, PROT_READ) where relro_end is computed as (ph->p_vaddr + ph->p_memsz) & -PAGE_SIZE (glibc is similar). The changed formula matches how musl/glibc mprotect PT_GNU_RELRO. FreeBSD rounds the size up. I have said it can be problematic in one of my previous comments and reiterated in my previous comment to your question.

"PT_GNU_RELRO segment starts on a page boundary" does not preclude the possibility that p_vaddr%maxpagesize!=0.

Out of curiosity I want to learn why your software needs a local patch. I am 90% certain that that specific software makes unfounded assumption about the section/segment layout.

Yes, it makes assumptions that may be unfounded but happen to match ld.bfd and ld.gold behavior. Specifically, it assumes that the linker creates a single RW segment like ld.bfd and ld.gold, and that the starting address of that RW segment matches the starting address of the RELRO segment. I'm not aware of any rule that ld.lld violates by creating more than one RW segment, but it is inconvenient that it does not match the behavior of the other linkers. My local patch stops splitting the RW segment so that there is only one segment and then removes the rounding of the RELRO size. IF I leave the rounding in place, then the entire RW segment ends up being covered by the RELRO, which is bad because then the program can't write to any of its data. With the rounding removed, the emitted layout is much closer to the one emitted by ld.bfd and ld.gold.

Your software may need -z separate-loadable-segments, as I mentioned in my previous comments to your question. We still have 2 RW, but they don't have overlapping p_offset, and they can actually be merged into one. Put it in another way, after mprotect'ing PT_GNU_RELRO, the memory mapping layout is not different from the case when there is only one RW. If your program does not parse its program header, there should be no runtime perceivable behavior differences. Merging 2 RW into one can be seen as an optimization, and lld does not support it. I think it is fine because a program header just costs sizeof(Elf64_Phdr)=56 bytes on ELF64.

The last page may be unprotected.

The last page being unprotected is better than incorrectly making the first page of writable data be RELRO. The former may miss identifying some programming errors or possibly open a security hole, but the latter certainly leads to the program crashing.

A better way might be to nudge the start of the writable data to begin later in the RW segment so that an integral number of initial pages can be RELRO, but ld.lld doesn't seem to do that and I'm not familiar with the code enough to add that myself. Hence, I'm settling for possibly not protecting all of the RO data until I know of a way to do the above.

The current layout is R RX RW(RELRO) RW(non-RELRO). Android folks proposed an alternative layout in August https://lists.llvm.org/pipermail/llvm-dev/2019-August/134801.html You can find David Chisnall's and my replies. I am not convinced it improves things.

troyj added a comment.Oct 22 2019, 9:35 AM

If your program does not parse its program header, there should be no runtime perceivable behavior differences.

It parses the program header. That's why it knows that there is more than one RW. It's looking for a single RW so that it can mremap its data, but it's confused when it finds more than one RW.

Merging 2 RW into one can be seen as an optimization, and lld does not support it.

Well....hold on. lld used to emit one RW prior to that patch, and now it doesn't. My issue is that I'm fine reverting the patch locally, but then RELRO overlaps the entire RW due to the rounding. So far I've tried eliminating the rounding, but now I'm trying it with the rounding restored and attempting to insert a dummy section prior to .data. to push .data. up to the next page boundary. I'm trying to figure that part out now, but it's not clear if I need to create it early as another synthetic section or if I can just create one in Writer.cpp.

If your program does not parse its program header, there should be no runtime perceivable behavior differences.

It parses the program header. That's why it knows that there is more than one RW. It's looking for a single RW so that it can mremap its data, but it's confused when it finds more than one RW.

Merging 2 RW into one can be seen as an optimization, and lld does not support it.

Well....hold on. lld used to emit one RW prior to that patch, and now it doesn't. My issue is that I'm fine reverting the patch locally, but then RELRO overlaps the entire RW due to the rounding.

My feeling is one RW partially overlapping PT_GNU_RELRO can be more problematic. Because lld does not have code to align a middle section in a segment to maxpagesize. "overlaps the entire RW due to the rounding" is indeed the problem you may face. I think the current lld segment layout is better than GNU linkers'.

So far I've tried eliminating the rounding, but now I'm trying it with the rounding restored and attempting to insert a dummy section prior to .data. to push .data. up to the next page boundary. I'm trying to figure that part out now, but it's not clear if I need to create it early as another synthetic section or if I can just create one in Writer.cpp.

I suggest that you update the software that parses RW. If you really want to keep a local lld patch that restores the original behavior (which I consider inferior), in Writer.cpp:fixSectionAlignments, sets addrExpr to align to the maxpagesize for the first section after the last PT_GNU_RELRO section.

troyj added a comment.Oct 23 2019, 5:57 AM

I suggest that you update the software that parses RW. If you really want to keep a local lld patch that restores the original behavior (which I consider inferior) ...

Thank you for the advice. I understand from your perspective it is inferior, so I'm not trying to submit an upstream patch for this. I'm dealing with released software that is not receiving any further updates but will need to coexist with a new toolchain that uses lld. I can modify lld; I can't modify the other software.