This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Fix program header alloc when first PT_LOAD is not at lowest VMA
Needs ReviewPublic

Authored by pattop on Jan 21 2021, 7:03 PM.

Details

Summary

Previously lld attempted to allocate program headers before the lowest
VMA in the program image.

This patch changes the behaviour to allocate program headers in their
associated load segment. This will only happen if specified in a linker
script.

The new offset-headers.s test case previously failed with:

ld.lld: error: could not allocate headers

This improves compatibility with ld.bfd scripts.

Diff Detail

Event Timeline

pattop created this revision.Jan 21 2021, 7:03 PM
pattop requested review of this revision.Jan 21 2021, 7:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 21 2021, 7:03 PM
MaskRay added a comment.EditedJan 22 2021, 4:57 PM

This adds additional complexity to the logic to the linker script. The rule looks a bit arbitrary and doesn't make me feel comfortable, so I'd like to ask a few questions:

  • Does this solve a category of linker script problems?
  • Can you adapt your linker script to run with both GNU ld and LLD?
  • If not, do you think there is some exotic part which can be changed without affecting functionality?
  • If we don't support such usage, does that affect a large number of users? (Assuming they have willingness to adapt their linker scripts)
  • If we add this, do we still have explainable behaviors?

Thankyou for taking the time to review and comment.

The key requirement here is the ability to control the placement of program headers.

In my experience this is often encountered when dealing with nommu embedded systems, especially when linking for execute-in-place. In this case it's desirable to have program headers reside in flash without loading them into RAM.

On these systems the address of ROM/RAM cannot be controlled. Often ROM is at a higher address than RAM, and there are even systems with RAM at address 0 (e.g. ITCM on Cortex-M7).

I'm not sure where the requirement comes from that the first load segment must cover the program headers. GNU ld fails with the error "PHDRS and FILEHDR are not supported when prior PT_LOAD headers lack them" if this is not met. I did the same without thinking about whether this is really necessary. Maybe it is simply a limitation of GNU ld?

I have not been able to craft a linkscript for LLD which can place the program headers somewhere other than the lowest VMA in the program. I've had (unpatched) LLD create multi-gigabyte ELF files with overlapping load segments while trying.

To directly address your questions:

Does this solve a category of linker script problems?

Yes, generally nommu embedded and execute in place use cases.

Can you adapt your linker script to run with both GNU ld and LLD?

I don't think so. In general GNU ld works without any problems. I haven't come up with a solution which works with LLD without patches.

If not, do you think there is some exotic part which can be changed without affecting functionality?

I'm not sure, sorry.

If we don't support such usage, does that affect a large number of users? (Assuming they have willingness to adapt their linker scripts)

I think this certainly limits the usability of LLD in nommu/deeply embedded/XIP scenarios. I can't comment on how many users this may be.

If we add this, do we still have explainable behaviors?

I don't think there will be any difficulties explaining the behaviours if they are similar enough (or equivalent) to GNU ld.

I'm not clear on what the requirements are here. I would be very interested to see a reference to placing headers in the first loadable program segment. That sounds like it could be a convention of some linker/platform, but I can't remember seeing that in any specification.

The best reference I've found is in Levine's Linkers and Loaders book (from 2000) where it talks about

ELF files extend the "header in the address space" trick used in QMAGIC a.out files to make the executable file as compact as possible at the cost of some unused space in the address space."

Looking up QMAGIC it says:

 Compact pageable files consider the a.out header to be part of the text segment, because there's no particular reason that the code in the text segment has to start at location zero.
...
The code actually starts immediately after the header and the whole page is mapped into the second page of the process, leaving the first page unmapped so that pointer references to location zero will fail. This has the harmless side effect of mapping the header into the process as well.

My understanding of embedded systems (mostly deeply embedded with at most an RTOS) is that the loadable segments are extracted from the ELF file and burned into ROM/Flash. The program never refers to the ELF header and program headers. It sounds like you have a requirement to place the ELF header and program headers in an arbitrary segment?

If so, rather than alter the QMAGIC convention, I think it would be better to use the PHDRS linker script command https://sourceware.org/binutils/docs/ld/PHDRS.html I believe that permits the headers to be allocated to a user controlled segment. If LLD's PHDRS support isn't good enough (compared to BFD) I'd prefer we improved it.

I'm not clear on what the requirements are here.

The ability to control the segment in which the program headers reside.

BFD ld requires that they are in the first loadable segment. But it does not require that the first loadable segment is at the lowest address, or that the segments are in any particular order.

I would be very interested to see a reference to placing headers in the first loadable program segment. That sounds like it could be a convention of some linker/platform, but I can't remember seeing that in any specification.

The best reference I've found is in Levine's Linkers and Loaders book (from 2000) where it talks about

ELF files extend the "header in the address space" trick used in QMAGIC a.out files to make the executable file as compact as possible at the cost of some unused space in the address space."

Looking up QMAGIC it says:

 Compact pageable files consider the a.out header to be part of the text segment, because there's no particular reason that the code in the text segment has to start at location zero.
...
The code actually starts immediately after the header and the whole page is mapped into the second page of the process, leaving the first page unmapped so that pointer references to location zero will fail. This has the harmless side effect of mapping the header into the process as well.

Thanks for the references. I also don't know where this originally comes from.

My understanding of embedded systems (mostly deeply embedded with at most an RTOS) is that the loadable segments are extracted from the ELF file and burned into ROM/Flash. The program never refers to the ELF header and program headers. It sounds like you have a requirement to place the ELF header and program headers in an arbitrary segment?

In this particular system there is a small RTOS with a program loader which can load ELF images. The system has a small RAM at address 0 (ITCM on a Cortex-M7) into which some critical functions need to be loaded. The rest of the program is loaded into a DRAM at some other location.

LLD currently forces the program headers to reside at the lowest address (in this case, address 0) which means that the program no longer fits into the available space.

If so, rather than alter the QMAGIC convention, I think it would be better to use the PHDRS linker script command https://sourceware.org/binutils/docs/ld/PHDRS.html I believe that permits the headers to be allocated to a user controlled segment. If LLD's PHDRS support isn't good enough (compared to BFD) I'd prefer we improved it.

Improving LLD's PHDRS support is what I was trying to do with this patch.

With the change in place program headers are allocated into the first loadable segment if it exists. This behaviour is close to BFD ld, but BFD ld can also put the headers into multiple segments (but each prior loadable segment must also include them).

This change alone is not enough to fully solve the problem. D95199 is also required for the link to succeed.

Thanks for the update and the link to D95199 . Thanks also for pointing out that this is reliant on the PHDR command, that wasn't clear from the description (I can see it in the test now). I'll try and take a deeper look at the weekend. Apologies I can only do this in my spare time and it is a busy week at the moment.

It would be good to update the description. Out of interest does using --nmagic help? LLD will default to --no-nmagic which assumes the ELF file will be paged in via an OS and will allocate headers into the first loadable segment. It would be good to make sure that --nmagic is supported as that is likely to be used in embedded systems.

I agree that the description could be better. Looks like I fixated on the "could not allocate headers" error message when I wrote it.

I think --nmagic is unrelated. In the included test case lld is unable to allocate headers because there is a MEMORY (mem2) located at address 0. The current logic finds the lowest VMA in the program (in this case 0) and tries place the headers in front of that address. Turning off page alignment won't help.

I've also found no good reason why BFD ld forces the program headers to reside in the first PT_LOAD segment. This change was based on that logic.

Perhaps a better change would be to find a PT_LOAD segment with FILEHDR or PHDRS specified and allocate the headers there.

BFD ld also supports multiple PT_LOAD segments specifying FILEHDR or PHDRS. I don't know if that needs to be supported or what the use case for it is.

Apologies for the delay in responding. Is there any reason why the ELF Header and Program Header need to be covered by a PT_LOAD region at all? For example if the ELF file is on flash somewhere. The RTOS loader reads the ELF header and program header from the file in flash. If the program headers are not in the address range covered by any PT_LOAD then there is no need to copy the program headers into the TCM or SDRAM. As I understand it with -nmagic (and no PHDRS) neither LLD or ld.bfd will attempt to put the ELF Header and Program Header into a PT_LOAD segment.

Personally If we are to change LLD in this area I'd much prefer we did it with a good understanding of PHDRS and how it should interact with header placement and I'm not sure we (collectively) have that yet. May be worth asking on the binutils mailing list.

Reading the docs: https://sourceware.org/binutils/docs/ld/PHDRS.html

You may use the FILEHDR and PHDRS keywords after the program header type to further describe the contents of the segment. The FILEHDR keyword means that the segment should include the ELF file header. The PHDRS keyword means that the segment should include the ELF program headers themselves. If applied to a loadable segment (PT_LOAD), all prior loadable segments must have one of these keywords.

There is test phdrs3a.t that may be of interest https://github.com/bminor/binutils-gdb/blob/master/ld/testsuite/ld-scripts/phdrs3a.t
This has the PHDRS

PHDRS
{
  data PT_LOAD FILEHDR PHDRS FLAGS(4);
  text PT_LOAD FILEHDR PHDRS FLAGS(1);
}

SECTIONS
{
  /* This test will fail on architectures where the startaddress below
     is less than the constant MAXPAGESIZE.  */
  . = 0x800000 + SIZEOF_HEADERS;
  .text : { *(.text) } :text
  .data : { *(.data) } :data
  /DISCARD/ : { *(.*) }
}

The output is interesting. Both the data and text segment start at 0x800000, include the headers and program headers, with the output ELF file producing two overlapping PT_LOAD program headers essentially:

0x800000 (data and text PHDR start)
ELF Header
Program Header
.data
(data PHDR end)
.text
(text PHDR end)

ld.bfd will give an error message if the FILEHDR and PHDR are removed from data.

To me, reversing the addresses so that the first PT_LOAD in the PHDRS command contains the FILEHDR and PHDR but is not the lowest program header is kind of a hack that seems to work than being intentional.

I can think of a few times I've encountered this on nommu targets. Some examples:

One product has a first stage bootloader which loads an ELF image from eMMC and then branches to its entry point. In this case PHDRS need to be loaded so that the program (a small RTOS) has them available. The RTOS uses its own PHDRS while initialising its page allocator to reserve regions where it itself is loaded, effectively a form of introspection. There's an internal SRAM in the thing at 0x20000000 and DRAM (where the headers go) is at 0x80000000.

Another product loads DSP routines into the TCMs of a Cortex-M7 as part of a larger program. This is effectively a normal ELF file with a few extra sections linked to absolute addresses. As stated earlier ITCM is at address 0. PHDRs definitely need to be loaded here as the C runtime reads them as part of its initialisation.

Regardless of whether these are good design decisions, BFD ld has this capability and it is definitely used in the wild.

On the topic of PHDRs and header placement I think the idea that headers should be part of the first load segment follows on from the fact that the file header is always at the beginning of the file. It's then quite natural for the first load segment to start at file offset 0.

I have also seen phdrs3a.t. It originates from a discussion here https://sourceware.org/legacy-ml/binutils/2009-10/msg00023.html which led to this bug https://sourceware.org/bugzilla/show_bug.cgi?id=10744 which resulted in the test case.

I think BFD ld intentionally does not place any restrictions on the VMAs of the load segments so I disagree that this is a hack or workaround.

You can specify the exact address of an output segment using the AT(ADDRESS) syntax without any restrictions. https://sourceware.org/binutils/docs/ld/PHDRS.html says:

You can specify that a segment should be loaded at a particular address in memory by using an AT expression. This is identical to the AT command used as an output section attribute (see Output Section LMA). The AT command for a program header overrides the output section attribute.
pattop updated this revision to Diff 331468.Mar 17 2021, 10:45 PM

Reduce changes, update test, clarify intent.

pattop edited the summary of this revision. (Show Details)Mar 17 2021, 10:47 PM
pattop edited the summary of this revision. (Show Details)