This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Map the ELF header at imageBase
ClosedPublic

Authored by MaskRay on Sep 7 2019, 10:50 PM.

Details

Summary

If there is no readonly section, we map:

  • The ELF header at imageBase+maxPageSize
  • Program headers at imageBase+maxPageSize+sizeof(Ehdr)
  • The first section .text at imageBase+maxPageSize+sizeof(Ehdr)+sizeof(program headers)

Due to the interaction between Writer<ELFT>::fixSectionAlignments and
LinkerScript::allocateHeaders,
alignDown(p_vaddr(R PT_LOAD)) = alignDown(p_vaddr(RX PT_LOAD)).
The RX PT_LOAD will override the R PT_LOAD at runtime, which is not ideal:

// PHDR at 0x401034, should be 0x400034
  PHDR           0x000034 0x00401034 0x00401034 0x000a0 0x000a0 R   0x4
// R PT_LOAD contains just Ehdr and program headers.
// At 0x401000, should be 0x400000
  LOAD           0x000000 0x00401000 0x00401000 0x000d4 0x000d4 R   0x1000
  LOAD           0x0000d4 0x004010d4 0x004010d4 0x00001 0x00001 R E 0x1000
  • createPhdrs allocates the headers to the R PT_LOAD.
  • fixSectionAlignments assigns imageBase+maxPageSize+sizeof(Ehdr)+sizeof(program headers) (formula: alignTo(dot, maxPageSize) + dot % config->maxPageSize) to addrExpr of .text
  • allocateHeaders computes the minimum address among SHF_ALLOC sections, i.e. addr(.text)
  • allocateHeaders sets address of ELF header to addr(.text)-sizeof(Ehdr)-sizeof(program headers) = imageBase+maxPageSize

The main observation is that when the SECTIONS command is not used, we
don't have to call allocateHeaders. This requires an assumption that
the presence of PT_PHDR and addresses of headers can be decided
regardless of address information.

This may seem natural because dot is not manipulated by a linker script.
The other thing is that we have to drop the special rule for -T<section>
in getInitialDot. If -Ttext is smaller than the image base, the headers
will not be allocated with the old behavior (allocateHeaders is called)
but always allocated with the new behavior.

The behavior change is not a problem. Whether and where headers are
allocated can vary among linkers, or ld.bfd across different versions
(--enable-separate-code or not). It is thus advised to use a linker
script with the PHDRS command to have a consistent behavior across
linkers. If PT_PHDR is needed, an explicit --image-base can be a simpler
alternative.

Event Timeline

MaskRay created this revision.Sep 7 2019, 10:50 PM
MaskRay updated this revision to Diff 219264.Sep 8 2019, 5:56 AM
MaskRay retitled this revision from [ELF] Place the ELF header at imageBase to [ELF] Map the ELF header at imageBase.
MaskRay edited the summary of this revision. (Show Details)
MaskRay removed a subscriber: wuzish.

Place -> Map (I'm not sure the right term to use, but "map" may be a better term to describe that this applies to p_vaddr, not p_offset)

Add some comments.

I'm struggling a bit with this one at the moment. I've put some comments in but I still need to work this one through. As I understand it we have today:

  • allocateHeaders() that decides whether the program headers are allocated or not and assigns the headers addresses. This is the same for both linkerscripts and non-linkerscripts.
  • fixSectionAlignments is performed only for the non-linkerscript case just prior to the final address allocation. It adds expressions to the OutputSections so that it will be aligned properly.

There is a clash between allocateHeaders and fixSectionAlignments and your change resolves this by only running allocateHeaders in the linkerscript. I can see that this would work, but I think that this could be at the expense of missing some of the logic in allocateHeaders, and leaving a point to diverge in the future. As this is making me a bit nervous at the moment I'd like to try and understand the conflict a bit better and whether there is a better way of doing this (there may not be).

ELF/LinkerScript.cpp
1013–1014

I think this comment would be stale with your proposed change.

1018

Given that allocateHeaders is only called once, it may be worth moving to the callsite in Writer.cpp

if (script->hasSectionsCommand)
    script->allocateHeaders(mainPart->phdrs)
1069

Delay may not be the right word as assignAddresses() is called multiple times and I think allocateHeaders() is called early, most likely before assignAddresses().

Perhaps:
// With a linker script assignment of addresses to headers is covered by allocateHeaders().

1072

I think that this would get nmagic and omagic wrong. In this case the ELF Headers and Program Headers are not allocated, even with the default linker script. I'd expect a linker script to be used with nmagic and omagic, but this would be a change in behaviour from ld.bfd and older versions of LLD.

MaskRay updated this revision to Diff 219550.Sep 10 2019, 8:18 AM
MaskRay marked 4 inline comments as done.

Address review comments

MaskRay added a comment.EditedSep 10 2019, 8:25 AM

There is a clash between allocateHeaders and fixSectionAlignments and your change resolves this by only running allocateHeaders in the linkerscript. I can see that this would work, but I think that this could be at the expense of missing some of the logic in allocateHeaders, and leaving a point to diverge in the future. As this is making me a bit nervous at the moment I'd like to try and understand the conflict a bit better and whether there is a better way of doing this (there may not be).

The main observation of the change is that if we can drop the -T<section> special rule (as we can see, ld.bfd's rule is quite involved; our current behavior already diverges; lld users can get similar behaviors with explicit --image-base), we can skip the whole allocateHeaders logic for the !hasSectionsCommand case.

For the default !hasSectionsCommand case, assignAddresses takes care of assigning addresses to headers, and there should be no case that the ELF header or program headers can be discarded. ld.bfd can discard them for the cases like -Ttext=0 but we can argue they are involved cases, and users should use a linker script to get more predictable behavior. (I can hardly find -Ttext=0 cases in the wild. I think they are either 1) --oformat binary (unaffected by this change) 2) used with SECTIONS command (unaffected by this change) 3) we fail to meet their requirement because our different layout. So the behavior for 3) may not matter much.

ELF/LinkerScript.cpp
1013–1014

When using a linker script, we also check if the headers are covered by the output section. This allows omitting the headers by not leaving enough space for them in the linker script; this pattern is common in embedded systems.

I think the two sentences were stale before this change. I'll delete them.

1072

In the nmagic and omagic cases, PT_PHDR and PT_INTERP are not allocated, but other program headers can still exist.

Several tests check their behaviors, e.g. nmagic.s, segments.s, magic-page-combo-warn.s. This change does not need an update of them.

I'll take another look tomorrow. Thanks for the clarifications.

ELF/LinkerScript.cpp
1011

From a double check, it seems like unconditional isn't true any more. In createPhdrs() nmagic and omagic don't add the ELF Headers into the loadable segment, looks like I missed updating the comment when adding omagic and nmagic.

Perhaps update the comment to:

// When the SECTIONS command is used, try to find an address for the file and
// program headers output sections, which can be added to the first PT_LOAD segment
// when program headers are created.
1072

What I was worrying about was whether the headers and program headers were mapped into the first PT_LOAD as this was something that allocateHeaders() took account of. However it looks like this isn't a problem as createPhdrs() doesn't "unconditionally" add them when either omagic or nmagic is added.

MaskRay updated this revision to Diff 219656.Sep 10 2019, 9:20 PM
MaskRay marked 3 inline comments as done.

Update a comment

ELF/LinkerScript.cpp
1011

t seems like unconditional isn't true any more. In createPhdrs() nmagic and omagic don't add the ELF Headers into the loadable segment

nmagic and omagic were changed in D67324.

Applied the suggested comments.

Apologies for the delay in responding. I've gone through in a bit more detail. Just to make sure I understand and hopefully give an alternative explanation to anyone else following:

  • Create an example with no .rodata, like basic-aarch64 for example:
.text
        .globl _start
        .type _start, %function
_start: 
        ret
  • createPhdrs() will create and allocate the ELF to the RO program header as it is the first PT_LOAD.
  • InputSections are assigned to OutputSections, there are no InputSections assigned to the RO program segment so it contains no SHF_ALLOC sections.
  • fixSectionAlignments assigns alignTo(script->getDot(), config->maxPageSize) + (script->getDot() % config->maxPageSize) to addrExpr for both the ELF Header SyntheticSection (RO program segment) and .text (Executable program segment)
  • AssignAddresses sets initial dot to 0x200000 (AArch64) + sizeof(headers) = 0x200120
  • AssignAddresses sets base address of .text to 0x210120 (as .text is SHF_ALLOC), the addrExpr for the ELF Header SyntheticSection (RO program header) is never used.
  • allocateHeaders searches for the first SHF_ALLOC section and finds .text as there are no SHF_ALLOC sections in the RO program segment.
  • allocateHeaders sets the ELF header to the address of the .text section
  • The RO program segment now overlaps the Executable program segment as the ELF Header is in the Executable program segment.

My understanding of the proposed solution is to not call allocateHeaders() and instead, effectively, force them into the RO program segment as we always know it will be first. This permits some simplification of the code. I think your comments on -Ttext aren't directly related to the code change, more of an observation. Let me know if I have that right?

One downside of this approach, albeit not a regression from the old behaviour, is that there is always an RO program segment. An alternative fix would be to detect that a program segment only contained the ELF Header and Program Header and move them to the first program segment that contained a SHF_ALLOC section. This would simplify the output when there is no .rodata to something like (AArch64 values simulated by producing a file with just rodata):

PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0000e0 0x0000e0 R   0x8
LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x000124 0x000124 RE 0x10000

Any thoughts?

For the bit about whether PT_PHDR should be generated. The ELF spec says in the PT_PHDR description:

Moreover, it may occur only if the program header table is part of the memory image of the program. If it is present, it must precede any loadable segment entry. See ``Program Interpreter'' below for more information.

ld.bfd seems to generate PT_PHDR if a PT_INTERP is needed. When I made a simple a.o from int main(void) { return 0; } I got a PT_PHDR from ld.bfd It is possible that in your example:

ld.bfd -o a -Ttext=0x3000 a.o => no PT_PHDR
ld.bfd -o a -Ttext=0x3000 a.o dummy.so (--enable-separate-code [1]) => place .note.gnu.property at 0x4000e8.

your a.o might not have enough in it to generate a PT_INTERP so no PT_PHDR.

I think that as long as our default linker script without -nmagic or -omagic can allocate headers which I think your change ensures I think it is fine for LLD to generate PT_PHDR. I consider not producing it due to no PT_INTERP an optimization.

Apologies for the delay in responding. I've gone through in a bit more detail. Just to make sure I understand and hopefully give an alternative explanation to anyone else following:

No worry. Thanks for the detailed explanation!

  • createPhdrs() will create and allocate the ELF to the RO program header as it is the first PT_LOAD.
  • InputSections are assigned to OutputSections, there are no InputSections assigned to the RO program segment so it contains no SHF_ALLOC sections.
  • fixSectionAlignments assigns alignTo(script->getDot(), config->maxPageSize) + (script->getDot() % config->maxPageSize) to addrExpr for both the ELF Header SyntheticSection (RO program segment) and .text (Executable program segment)
  • AssignAddresses sets initial dot to 0x200000 (AArch64) + sizeof(headers) = 0x200120
  • AssignAddresses sets base address of .text to 0x210120 (as .text is SHF_ALLOC), the addrExpr for the ELF Header SyntheticSection (RO program header) is never used.
  • allocateHeaders searches for the first SHF_ALLOC section and finds .text as there are no SHF_ALLOC sections in the RO program segment.
  • allocateHeaders sets the ELF header to the address of the .text section
  • The RO program segment now overlaps the Executable program segment as the ELF Header is in the Executable program segment.

I'll update the description with some information here.

My understanding of the proposed solution is to not call allocateHeaders() and instead, effectively, force them into the RO program segment as we always know it will be first. This permits some simplification of the code. I think your comments on -Ttext aren't directly related to the code change, more of an observation. Let me know if I have that right?

Yes. The main observation is that when the SECTIONS command is not used, we
don't have to call allocateHeaders. This requires an assumption that the
presence of PT_PHDR and addresses of headers can be decided regardless of
address information. You are right. -T<section> is a consequence of the assumption.

One downside of this approach, albeit not a regression from the old behaviour, is that there is always an RO program segment. An alternative fix would be to detect that a program segment only contained the ELF Header and Program Header and move them to the first program segment that contained a SHF_ALLOC section. This would simplify the output when there is no .rodata to something like (AArch64 values simulated by producing a file with just rodata):

PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0000e0 0x0000e0 R   0x8
LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x000124 0x000124 RE 0x10000

Any thoughts?

The RO PT_LOAD is allocated w/o or w/ this change. We can apply the optimization, with the change to createPhdrs:

--- a/ELF/Writer.cpp
+++ b/ELF/Writer.cpp
@@ -2116,3 +2116,6 @@ std::vector<PhdrEntry *> Writer<ELFT>::createPhdrs(Partition &part) {
         sec == relroEnd) {
-      load = addHdr(PT_LOAD, newFlags);
+      if (load && load->lastSec == Out::programHeaders && (newFlags & PF_R))
+        load->p_flags = newFlags;
+      else
+        load = addHdr(PT_LOAD, newFlags);
       flags = newFlags;

It will affect 125 tests, though, so I'm not sure whether we want to apply this optimization.
A user can use --no-rosegment to ensure the RO PT_LOAD does not exist.

For the bit about whether PT_PHDR should be generated. The ELF spec says in the PT_PHDR description:

Moreover, it may occur only if the program header table is part of the memory image of the program. If it is present, it must precede any loadable segment entry. See ``Program Interpreter'' below for more information.

ld.bfd seems to generate PT_PHDR if a PT_INTERP is needed. When I made a simple a.o from int main(void) { return 0; } I got a PT_PHDR from ld.bfd It is possible that in your example:

ld.bfd -o a -Ttext=0x3000 a.o => no PT_PHDR
ld.bfd -o a -Ttext=0x3000 a.o dummy.so (--enable-separate-code [1]) => place .note.gnu.property at 0x4000e8.

your a.o might not have enough in it to generate a PT_INTERP so no PT_PHDR.

I think that as long as our default linker script without -nmagic or -omagic can allocate headers which I think your change ensures I think it is fine for LLD to generate PT_PHDR. I consider not producing it due to no PT_INTERP an optimization.

Confirmed that PT_PHDR will be created if .interp (PT_INTERP) exists. The ld.bfd examples are not directly related to the code change. I'll replace them with a brief mention of the problem.

MaskRay updated this revision to Diff 219849.Sep 11 2019, 11:14 PM
MaskRay edited the summary of this revision. (Show Details)

Update description

One downside of this approach, albeit not a regression from the old behaviour, is that there is always an RO program segment. An alternative fix would be to detect that a program segment only contained the ELF Header and Program Header and move them to the first program segment that contained a SHF_ALLOC section. This would simplify the output when there is no .rodata to something like (AArch64 values simulated by producing a file with just rodata):

PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0000e0 0x0000e0 R   0x8
LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x000124 0x000124 RE 0x10000

Any thoughts?

The RO PT_LOAD is allocated w/o or w/ this change. We can apply the optimization, with the change to createPhdrs:

--- a/ELF/Writer.cpp
+++ b/ELF/Writer.cpp
@@ -2116,3 +2116,6 @@ std::vector<PhdrEntry *> Writer<ELFT>::createPhdrs(Partition &part) {
         sec == relroEnd) {
-      load = addHdr(PT_LOAD, newFlags);
+      if (load && load->lastSec == Out::programHeaders && (newFlags & PF_R))
+        load->p_flags = newFlags;
+      else
+        load = addHdr(PT_LOAD, newFlags);
       flags = newFlags;

It will affect 125 tests, though, so I'm not sure whether we want to apply this optimization.
A user can use --no-rosegment to ensure the RO PT_LOAD does not exist.

I think the number of people that have no rodata at all in a executable or shared object with allocated headers will be sufficiently small that it isn't worth updating 125 tests for. I'm happy enough with the change now that I understand it better. I think it will be worth waiting a few days to see if there are any more opinions, and if there aren't any go ahead with this next week.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 16 2019, 12:05 AM
This revision was automatically updated to reflect the committed changes.