Page MenuHomePhabricator

[ELF] Respect output section alignment for AT> (non-null lmaRegion)
ClosedPublic

Authored by MaskRay on Sat, Feb 8, 12:15 PM.

Details

Summary

When lmaRegion is non-null, respect the output section alignment.
This rule is analogous to switchTo(sec) which advances sh_addr (VMA).

This fixes the p_paddr misalignment issue as reported by
https://android-review.googlesource.com/c/trusty/external/trusted-firmware-a/+/1230058

linkerscript/align-lma.s has a FIXME that demonstrates another bug:
.bss ... >RAM should be placed in a different PT_LOAD (GNU ld
behavior) because its lmaRegion (nullptr) is different from the previous
section's lmaRegion (ROM).

Diff Detail

Event Timeline

MaskRay created this revision.Sat, Feb 8, 12:15 PM
MaskRay updated this revision to Diff 243407.Sat, Feb 8, 12:58 PM

Improve test

MaskRay updated this revision to Diff 243449.Sun, Feb 9, 8:51 AM

Rename align-lma.test to lma-align.test and improve a FIXME comment.

It looks fine to me. Perhaps please wait a bit to see if other people have an opinion on this.
Have a few minor nits/questions though.

lld/test/ELF/linkerscript/lma-align.test
4

3 nits:

  1. The space is missing .byte 0;.data; -> .byte 0; .data;
  2. It is probably might be a matter of taste or may be related to my personal workflow, but still: the constructions in test cases which use | like here do make things a bit more difficult to debug. I.e. the asm file is not created, but sometimes it is useful to have it. Up to you probably as it is not that hard to change when needed anyways.
  3. Pardon my ignorance, but why .space 1 for .bss and .byte 0 for asm? Does it have a different meaning?
19

If you have a plan to fix it soon, it is fine, otherwise may be worth to create a bug and add a link here?

(nit: add a : after FIXME)

30

Missing a full stop here and in the comment above.

grimar added inline comments.Mon, Feb 10, 1:03 AM
lld/test/ELF/linkerscript/lma-align.test
4
  1. for asm -> for .data
grimar added inline comments.Mon, Feb 10, 2:04 AM
lld/test/ELF/linkerscript/lma-align.test
19

Ah, you've fixed it in D74297. I did not see it at the moment of writing this comment.

psmith added a subscriber: psmith.Mon, Feb 10, 2:06 AM

Do you happen to know how this relates to the ALIGN_WITH_INPUT linker script keyword? Apologies I've got stuck in an airport due to flight cancellations and can't check easily. I had thought that in BFD if you wanted InputSection alignment to affect LMA you needed to add ALIGN_WITH_INPUT.

For example, taken from one of the few pages I could find that mentioned it: https://www.openstm32.org/forumthread4866

don’t forget to use ALIGN_WITH_INPUT section attribute, or you’d have aligned sections slipping if previous section doesn’t completely fill the aligned size.
...
  /* The startup code goes first into FLASH */
  .isr_vector : ALIGN_WITH_INPUT
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH AT > AXIFLASH

A brief look at BFD implies that LMA is only aligned for an OS that goes into a memory region if ALIGN_WITH_INPUT is true. I suspect that not aligning LMA can give some size savings, for example if at load time the OutputSections are byte-copied from their LMA to their aligned VMA then it doesn't matter if they are not aligned in LMA.

If I'm right it may be worth following BFD here as if we make ALIGN_WITH_INPUT the default, which is safe, then we can't then introduce ALIGN_WITH_INPUT without risking breaking some programs.

MaskRay updated this revision to Diff 243592.Mon, Feb 10, 9:08 AM
MaskRay marked 4 inline comments as done.

Address comments

lld/test/ELF/linkerscript/lma-align.test
4

.zero 1, .space 1 and .byte 0 (zero initializers) have the same effect. I'll use .byte 0 consistently.

A SHT_NOBITS section cannot have a non-zero initializer.

MaskRay added a comment.EditedMon, Feb 10, 12:01 PM

Do you happen to know how this relates to the ALIGN_WITH_INPUT linker script keyword? Apologies I've got stuck in an airport due to flight cancellations and can't check easily. I had thought that in BFD if you wanted InputSection alignment to affect LMA you needed to add ALIGN_WITH_INPUT.

For example, taken from one of the few pages I could find that mentioned it: https://www.openstm32.org/forumthread4866

don’t forget to use ALIGN_WITH_INPUT section attribute, or you’d have aligned sections slipping if previous section doesn’t completely fill the aligned size.
...
  /* The startup code goes first into FLASH */
  .isr_vector : ALIGN_WITH_INPUT
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH AT > AXIFLASH

A brief look at BFD implies that LMA is only aligned for an OS that goes into a memory region if ALIGN_WITH_INPUT is true. I suspect that not aligning LMA can give some size savings, for example if at load time the OutputSections are byte-copied from their LMA to their aligned VMA then it doesn't matter if they are not aligned in LMA.

If I'm right it may be worth following BFD here as if we make ALIGN_WITH_INPUT the default, which is safe, then we can't then introduce ALIGN_WITH_INPUT without risking breaking some programs.

Thanks for mentioning ALIGN_WITH_INPUT. I investigated it a bit. The following is my reconstruction of its logic after omitting unimportant things:

if (!bfd_is_abs_section (os->bfd_section)) {
  if (os->addr_tree) // output_section_address is specified
    section_alignment = exp_get_power (os->section_alignment, "section alignment");
  else {
    ...
    newdot = os->region->current;
    /// Maximum of input section alignments
    section_alignment = os->bfd_section->alignment_power;
  }

  if (section_alignment > 0) {
    dotdelta = align_power (newdot, section_alignment) - newdot;
    if (dotdelta != 0 && os->addr_tree)
      einfo("warning: changing start of section %s by %lu bytes", ...);
  }

  os->bfd_section->vma = os->bfd_section->lma = newdot;
}

if (os->load_base) {
  bfd_vma lma = exp_get_abs_int (os->load_base, 0, "load base");
  os->bfd_section->lma = lma;
} else if (os->lma_region != NULL) {
  bfd_vma lma = os->lma_region->current;

  if (os->align_lma_with_input) // ALIGN_WITH_INPUT
    lma += dotdelta;
  else {
    if (os->lma_region != os->region)
      section_alignment = exp_get_power (os->section_alignment, "section alignment");
    if (section_alignment > 0)
      lma = align_power (lma, section_alignment);
  }
  os->bfd_section->lma = lma;
}

lld does not support ALIGN_WITH_INPUT.

I think BFD either uses maximum input section alignments (addr_tree == NULL), or the ALIGN value (addr_tree != NULL) (not investigated carefully). lld's sec->alignment is the maximum of both:

// LinkerScript.cpp
    // Handle align (e.g. ".foo : ALIGN(16) { ... }").
    if (sec->alignExpr)
      sec->alignment =
          std::max<uint32_t>(sec->alignment, sec->alignExpr().getValue());

I think max can be dropped.

ctx->lmaOffset = alignTo(mr->curPos, sec->alignment) - dot; should work and capture the intention of the BFD's logic quite well.

A list of changes we need to make:

  • D74286 this patch
  • D74297 Start a new PT_LOAD if LMA region is different
  • Add a warning changing start of section .foo by 15 bytes when sec->addrExpr is set
  • (Optionally) address a difference when the output section address is not specified. .data . : { *(.data) } If input .data has an alignment of 16. BFD can place .data at a misaligned place. lld always respects ctx->outSec->alignment in switchTo(sec) and places .data at an aligned place. It may just waste a few bytes if the script uses data commands like .data . : { BYTE(0); *(.data) } (dot%16=15). I can create a patch if you think this is worth improving.
MaskRay updated this revision to Diff 243903.Tue, Feb 11, 9:34 AM

Forgot to change .byte 0;.data; -> .byte 0; .data;

Looks good? :)

Looks good? :)

I'll take a look tomorrow, apologies not had a lot of spare time today.

OK, to check my understanding. Tracing through that bit of BFD code it seems like in the absence of ALIGN_WITH_INPUT the VMA is aligned to the OutputSection alignment when:

  • (os->addr_tree) // output_section_address is specified
  • if (os->lma_region != os->region) section_alignment = exp_get_power (os->section_alignment, "section alignment");

In all other cases it uses the max of input section alignment. With ALIGN_WITH_INPUT we force the use of dotdelta, which is align by the same alignment as the dot got aligned. I think that this is subtly different to the section alignment as the VMA and LMA could start from different bases.

For LLD we are using the max of input_section_alignment and output section alignment. Which aligns more than bfd, as there is a possibility in bfd that output section alignment is lower than the max of input section alignment. For example running the test case using ld.bfd using arm-none-eabi-ld (only one I had on the machine so I had to replace ret with a 4-byte sized arm instruction).

LOAD           0x000000 0x00000000 0x00000000 0x01004 0x01004 R E 0x10000
LOAD           0x011000 0x00011000 0x00001004 0x00001 0x00001 RW  0x10000
LOAD           0x011010 0x00011010 0x00001010 0x00001 0x00001 RW  0x10000
LOAD           0x011040 0x00011040 0x00011040 0x00000 0x00001 RW  0x10000

Note that .data.rel.ro despite having an alignment of 16, hasn't had the LMA aligned. This could matter to some embedded systems people wanting to squeeze every last byte out of a tiny microcontroller's flash, but not likely to be a problem for the firmware on something that is booting linux. My concern is that if we adopt the strict behaviour it will be difficult to go back without breaking existing linker scripts.

One argument that we can change this in the future is that the vast majority of embedded projects use ld.bfd and will be used to its behaviour, so we can conform more tightly to it in the future.

MaskRay added a comment.EditedWed, Feb 12, 7:41 AM

OK, to check my understanding. Tracing through that bit of BFD code it seems like in the absence of ALIGN_WITH_INPUT the VMA is aligned to the OutputSection alignment when:

  • (os->addr_tree) // output_section_address is specified
  • if (os->lma_region != os->region) section_alignment = exp_get_power (os->section_alignment, "section alignment"); In all other cases it uses the max of input section alignment. With ALIGN_WITH_INPUT we force the use of dotdelta, which is align by the same alignment as the dot got aligned. I think that this is subtly different to the section alignment as the VMA and LMA could start from different bases.

    For LLD we are using the max of input_section_alignment and output section alignment. Which aligns more than bfd, as there is a possibility in bfd that output section alignment is lower than the max of input section alignment. For example running the test case using ld.bfd using arm-none-eabi-ld (only one I had on the machine so I had to replace ret with a 4-byte sized arm instruction). ` LOAD 0x000000 0x00000000 0x00000000 0x01004 0x01004 R E 0x10000 LOAD 0x011000 0x00011000 0x00001004 0x00001 0x00001 RW 0x10000 LOAD 0x011010 0x00011010 0x00001010 0x00001 0x00001 RW 0x10000 LOAD 0x011040 0x00011040 0x00011040 0x00000 0x00001 RW 0x10000 ` Note that .data.rel.ro despite having an alignment of 16, hasn't had the LMA aligned. This could matter to some embedded systems people wanting to squeeze every last byte out of a tiny microcontroller's flash, but not likely to be a problem for the firmware on something that is booting linux. My concern is that if we adopt the strict behaviour it will be difficult to go back without breaking existing linker scripts.

    One argument that we can change this in the future is that the vast majority of embedded projects use ld.bfd and will be used to its behaviour, so we can conform more tightly to it in the future.

Thanks for confirming.

I am not too concerned about lld align more than bfd tentatively. I believe such LMA overalignment just wastes some bytes. It should cause more problems.

But as https://reviews.llvm.org/D74286#1867852 wrote, after this patch and D74297 are submitted, I'll check how to fix overalignment.

@nickdesaulniers @srhines Any projects using lld AT> (LMA memory regions) should be given a bit more caution, as there are still a number of differences in this area. Hope the dependent projects can be adjusted instead of adding workarounds to lld :)

psmith accepted this revision.Wed, Feb 12, 7:55 AM

OK, the overalignment is one of those don't care for the vast majority of users, and really important to a small number of people. It most often causes a problem when something is heavily aligned in VMA, but as it is copied from ROM/Flash its alignment doesn't matter in LMA.
Anyhow if it is something we can look at fixing later then LGTM as this will solve the immediate problem.

This revision is now accepted and ready to land.Wed, Feb 12, 7:55 AM
This revision was automatically updated to reflect the committed changes.