This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Only add contiguous isRelro sections to PT_GNU_RELRO PHDR
AbandonedPublic

Authored by peter.smith on Nov 14 2017, 8:14 AM.

Details

Summary

Some linker scripts, in particular the default ld.bfd linker script can match the InX::BssRelRo section in the .bss OutputSection . This causes the OutputSections for which isRelroSection() is true to be non-contiguous in the memory map. If we add all the OutputSections to the PT_GNU_RELRO PHDR then the size of said PHDR will include read-write sections. This may cause segfaults if the PT_GNU_RELRO PHDR covers read-write data that is written to.

This change only adds OutputSections for which isRelroSection() is true to the PT_GNU_RELRO PHDR if they are contiguous in the memory map. Later OutputSections for which isRelroSection() is true will be read-write.

Fixes PR35265

Implementation Notes:
I found this problem when using the default linker script from ld.bfd (script obtainable via ld --verbose). Even with an empty main the program would segfault when writing back the result of lazy PLT resolution into the .got.plt. The segfault was caused by the .got.plt section being marked read-only as it was covered by the PT_GNU_RELRO PHDR. The reason it was covered was due to the zero sized InX::BssRelRo section matching the .bss Output Section in the linker script extract below. As the current code calculates the size of the PT_GNU_RELRO PHDR as P->p_memsz = Last->Addr + Last->Size - First->Addr; we get the size of .got.plt and .data included in the PT_GNU_RELRO PHDR as InX::BssRelRo is Last.

The impact of the fix is that if the linker script does produce non-contiguous isRelroSections then only the first contiguous chunk will be covered by the PT_GNU_RELRO PHDR. Any remaining isRelroSections will be read-write in the output. I've chosen this in favour of compatibility of the default ld.bfd linker script; however a stricter view would be that it is an error if isRelroSections are not contiguous as this could be a potential security hole. I note that it was a 0 sized InX::BssRelRo matching in *(.bss .bss.* .gnu.linkonce.b.*) that passed the if (needsPtLoad(Sec) && isRelroSection(Sec)) test. Perhaps if non-contiguous isRelroSections is an error then 0 sized InX::BssRelRo can be an exception as this is not an obvious user-error.

.got            : { *(.got) *(.igot) }
. = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .);
.got.plt        : { *(.got.plt)  *(.igot.plt) }
.data           :
{
  *(.data .data.* .gnu.linkonce.d.*)
  SORT(CONSTRUCTORS)
}
.data1          : { *(.data1) }
_edata = .; PROVIDE (edata = .);
. = .;
__bss_start = .;
.bss            :
{
 *(.dynbss)
 *(.bss .bss.* .gnu.linkonce.b.*)
 *(COMMON)
 /* Align here to ensure that the .bss section occupies space up to
    _end.  Align after .bss to ensure correct alignment even if the
    .bss section disappears because there are no input sections.
    FIXME: Why do we need it? When there is no .bss section, we don't
    pad the .data section.  */
 . = ALIGN(. != 0 ? 64 / 8 : 1);
}

Diff Detail

Event Timeline

peter.smith created this revision.Nov 14 2017, 8:14 AM
ruiu edited edge metadata.Nov 14 2017, 11:45 PM

I wonder if we can just have two or more PT_GNU_RELRO segments. Is it illegal?

In D40029#925731, @ruiu wrote:

I wonder if we can just have two or more PT_GNU_RELRO segments. Is it illegal?

In theory there is nothing in what passes for the specification of PT_GNU_RELRO that prohibits it, the most official reference I could find was the Linux Standards Base (https://refspecs.linuxfoundation.org/LSB_4.0.0/LSB-Core-generic/LSB-Core-generic.html):

PT_GNU_RELRO
the array element specifies the location and size of a segment which may be made read-only after relocations have been processed.

In practice relro is a collaboration/collusion between linkers and loaders and to the best of my knowledge the linux loader only records one relro segment per link_map. See struct link_map in https://code.woboq.org/userspace/glibc/include/link.h.html . The ELF processing code also seems to assume only one PT_GNU_RELRO, I think the loader would process either the first or last and ignore the others. I don't know if the BSD loader can handle more than one PT_GNU_RELRO program header.

ruiu added a comment.Nov 15 2017, 7:03 PM

I think that the right thing to do is to create a RELRO segment for each contiguous RELRO sections and then update the loader so that it can handle multiple RELROs. What do you think?

In D40029#927020, @ruiu wrote:

I think that the right thing to do is to create a RELRO segment for each contiguous RELRO sections and then update the loader so that it can handle multiple RELROs. What do you think?

Hmmm, I think that approach will be difficult for a couple of reasons. The first is that I think it will be difficult to get all the existing dynamic loaders to change unless we could think of an important use-case that can't be solved by just a single segment. I think that if I were a dynamic loader author I'd say "Why do you want multiple non-contiguous RELRO segments in an ELF file designed for an OS with a standard image layout? Why do I need to add complexity for an edge case?". The second is that there is a large lag time between getting a patch accepted in glibc and it getting deployed, for LTS releases we could be looking at several years before loaders that could support multiple RELRO segments.

Looking at the specific problem again, and coming across D28272 "Reserve space for copy relocations of read-only symbols in relro" what I think has happened here is:

  • LLD uses a section InX::BssRelRo with name .bss.rel.ro to contain copy relocations of read-only symbols, if the relro sections are not matched in the linker script these get placed in a single contiguous chunk by sortSections().
  • ld.bfd and gold use a section called .data.rel.ro so .bss.rel.ro doesn't appear on the default ld linker script and unfortunately the .bss.* matches our .bss.rel.ro so we get non-contiguous relro.
  • In comment https://reviews.llvm.org/D28272#637284 there is a comment

Relatedly, I discovered a bug in this patch. We can't use SHT_NOBITS for the new section because we need to lay out all NOBITS sections contiguously at the end of the LOAD. RELRO also needs to be contiguous (there can only be one RELRO header), so we can't have another
RELRO covering part of the bss.

In principle, we could teach the linker to emit two r/w LOAD headers, one for RELRO and the other for non-RELRO, each with its own bss, but it's not clear that that would be worth it. So we might as well use SHT_PROGBITS and rename the section to .data.rel.ro to match the type. > I'll upload a new patch that does that.

It isn't clear to me that the follow up patch landed, if it did we wouldn't have had this problem as .data.rel.ro would have matched in the linker script. Given how difficult it will be to update all the loaders to support multiple RELRO headers I think that the best course of action is:

  • Work out what happened to the follow up patch for D28272, and get it applied. This would make the default ld linker script work out of the box.
  • Warn or error if multiple non-contiguous relro is encountered. At least on the glibc loader the PT_GNU_RELRO is searched for in more than one direction so outputting more than one could cause crashes or security leaks.

I'll add the author of D28272 as a reviewer.

Adding pcc as the original author of D28272.

Peter, in https://reviews.llvm.org/D28272#637284 you mention that you'd add a follow up patch that used .data.rel.ro rather than .bss.rel.ro for the copy relocations. I can't work out if that ever landed. Can you let me know if did? I've just ran into a case where the assumptions made by a ld.bfd linker script cause problems with the .bss.rel.ro name.

ruiu added a comment.Nov 16 2017, 3:27 AM

Hmmm, I think that approach will be difficult for a couple of reasons. The first is that I think it will be difficult to get all the existing dynamic loaders to change unless we could think of an important use-case that can't be solved by just a single segment. I think that if I were a dynamic loader author I'd say "Why do you want multiple non-contiguous RELRO segments in an ELF file designed for an OS with a standard image layout? Why do I need to add complexity for an edge case?". The second is that there is a large lag time between getting a patch accepted in glibc and it getting deployed, for LTS releases we could be looking at several years before loaders that could support multiple RELRO segments.

I don't know if it is actually a problem. In this patch, you are attempting to add a RELRO segment only for the first contiguous regions that should be RELRO. My understanding is that the existing loader works OK with multiple RELRO segments, but only the first RELRO segment takes effect. That means, if you create multiple RELRO segments, that effectively works the same as creating a RELRO for the first contiguous region, no? If that's the case, creating multiple RELRO should fix the crash bug that you are trying to fix.

By doing that, we can buy time to fix dynamic loaders. As a result, the situation will be (1) if you are using an old loader, your binary still work, but the second and further RELRO segments are ignored, and (2) if you are using a newer loader, all RELRO segments are processed.

The above was my plan. What am I missing?

In D40029#927209, @ruiu wrote:

Hmmm, I think that approach will be difficult for a couple of reasons. The first is that I think it will be difficult to get all the existing dynamic loaders to change unless we could think of an important use-case that can't be solved by just a single segment. I think that if I were a dynamic loader author I'd say "Why do you want multiple non-contiguous RELRO segments in an ELF file designed for an OS with a standard image layout? Why do I need to add complexity for an edge case?". The second is that there is a large lag time between getting a patch accepted in glibc and it getting deployed, for LTS releases we could be looking at several years before loaders that could support multiple RELRO segments.

I don't know if it is actually a problem. In this patch, you are attempting to add a RELRO segment only for the first contiguous regions that should be RELRO. My understanding is that the existing loader works OK with multiple RELRO segments, but only the first RELRO segment takes effect. That means, if you create multiple RELRO segments, that effectively works the same as creating a RELRO for the first contiguous region, no? If that's the case, creating multiple RELRO should fix the crash bug that you are trying to fix.

By doing that, we can buy time to fix dynamic loaders. As a result, the situation will be (1) if you are using an old loader, your binary still work, but the second and further RELRO segments are ignored, and (2) if you are using a newer loader, all RELRO segments are processed.

The above was my plan. What am I missing?

The only thing that I think that you are missing is that with at least the current state of glibc, there is at least one case where the PT_GNU_RELRO PHDR is searched for from the end of the table, the comment alongside the code says, the PT_GNU_RELRO PHDR is usually last. So if we output more than one we can't guarantee that the first PHDR will be selected, this could be dangerous.

My suggestion is to:

  • Change lld to use .data.rel.ro instead of .bss.rel.ro; this will sort out the non-user error case.
  • I would prefer to emit a warning and only output one PT_GNU_RELRO header, although I concede that putting out as many PT_GNU_RELRO header as we need is valid response to a linker script containing in effect a user-error, however I think we should only do this until we've made the .bss.rel.ro change.

I see. Thank you for explaining. But, how much dangerous is it to use only the last RELRO compared to only the first one? I wonder if it makes sense to just reverse the RELRO segments. Anyway, I'm also interested in hearing from pcc.

Looking at the source rtld.c https://code.woboq.org/userspace/glibc/elf/rtld.c.html it looks like it is only the PT_GNU_RELRO for the dynamic loader shared object itself that is searched in reverse order. For all executables and shared objects all the PHDRS in the table are processed in order, with PT_GNU_RELRO just a case in a switch table. I think that this means that the last PT_GNU_RELRO that is encountered will describe the region to be protected. I don't think that this will be dangerous, it will just mean that only one of the RELRO segments will be protected, which could be a security hole.

pcc edited edge metadata.Nov 16 2017, 9:44 AM

Adding pcc as the original author of D28272.

Peter, in https://reviews.llvm.org/D28272#637284 you mention that you'd add a follow up patch that used .data.rel.ro rather than .bss.rel.ro for the copy relocations. I can't work out if that ever landed. Can you let me know if did? I've just ran into a case where the assumptions made by a ld.bfd linker script cause problems with the .bss.rel.ro name.

No, the use of NOBITS is deliberate. See http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170102/417025.html

Could we instead rename .bss.rel.ro to .data.rel.ro only if the linker script has a SECTIONS clause?

Thanks for pointing that out. I missed that part of the discussion on the Phabricator review. I'll have an experiment with renaming to see what happens.

Updating diff based on mail comments llvm-commits, rather than paste everything in here, the comments start from these points.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171113/503153.html
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171120/503888.html

My summary of the comments is:

  • It is better to give an error message when the relro sections are not contiguous.
  • Ignore zero sized sections when checking for contiguous relro sections.
  • We should only attempt to rename the .bss.rel.ro section when a script is in use as we would like to use bss as much as possible.

The changes I've made for this update:

  • Give an error message if relro sections are not contiguous.
  • If the linker script has SECTIONS and a .data.rel.ro OutputSection then use the name .data.rel.ro.bss instead of .bss.rel.ro. I've used the reasoning that the vast majority of linker scripts are not going to mess about with linker defined sections, where as an explicit .data.rel.ro is a sign that someone wanting GNU linker script compatibility. I'm happy to change this if there is a better heuristic.

I'm going to split this patch up into 3 at the suggestion of this message from [llvm-commits].

I think I am fine with the overall direction of this change.

My main request is to split the patch in 2 or even better 3:

  • Give an error if relro is not contiguous.
  • Relax handling of empty sections.
  • Name change when linker scripts are used.

+ bool InRelroPhdr = false;
+ bool IsRelroFinished = false;
+ for (OutputSection *Sec : OutputSections) {
+ if (!needsPtLoad(Sec))
+ continue;
+ if (isRelroSection(Sec)) {
+ InRelroPhdr = true;
+ if (!IsRelroFinished)
+ RelRo->add(Sec);
+ else if (Sec->Size)
+ error("section: " + Sec->Name + " is not contiguous with other relro" +
+ " sections");
+ } else if (InRelroPhdr && Sec->Size) {
+ InRelroPhdr = false;
+ IsRelroFinished = true;
+ }
+ }

With this an empty relro section can still start the PT_GNU_RELRO. Is
that intentional?

Cheers,
Rafael

The first patch to output the error message is available at https://reviews.llvm.org/D40359. I've noticed that the size of the .dynamic section hasn't been set at the time createPhdrs() is called so I may need to think about the relaxation of the 0 size sections for a bit.

peter.smith abandoned this revision.Dec 15 2017, 3:17 AM

This revision was split up and fixed in D40359 D40365 and D40735 abandoning as it is no longer needed.