This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Simplify sh_addr computation and warn if sh_addr is not a multiple of sh_addralign
ClosedPublic

Authored by MaskRay on Mar 5 2020, 10:01 PM.

Details

Summary

See docs/ELF/linker_script.rst for the new computation for sh_addr and sh_addralign. ALIGN(section_align) now means: "increase alignment to section_align"

The "start of section .foo changes from 0x11 to 0x20" warning no longer
makes sense. Change it to warn if sh_addr%sh_addralign!=0.

To decrease the alignment from the default max_input_align,
use .output ALIGN(8) : {} instead of .output : ALIGN(8) {}
See linkerscript/section-address-align.test as an example.

When both an output section address and ALIGN are set (can be seen as an
"undefined behavior" https://sourceware.org/ml/binutils/2020-03/msg00115.html),
lld may align more than GNU ld, but it makes a linker script working
with GNU ld hard to break with lld.

This patch can be considered as restoring part of the behavior before D74736.

Diff Detail

Event Timeline

MaskRay created this revision.Mar 5 2020, 10:01 PM
MaskRay edited the summary of this revision. (Show Details)Mar 5 2020, 10:02 PM
psmith added a comment.Mar 6 2020, 9:32 AM

In theory I don't mind that we deviate from GNU ld where it makes sense, however given that our Linker Script documentation is the GNU ld docs, and that has considerable gaps, I'd be much more comfortable in approving if we had more user facing documentation where we document the differences. I know that this has come up before but we didn't take any action. Is it worth starting now, even if we only document this one difference and add to it over time? This could either be in our existing man page and/or in the official LLVM docs.

MaskRay added a comment.EditedMar 6 2020, 11:35 AM

In theory I don't mind that we deviate from GNU ld where it makes sense, however given that our Linker Script documentation is the GNU ld docs, and that has considerable gaps, I'd be much more comfortable in approving if we had more user facing documentation where we document the differences. I know that this has come up before but we didn't take any action. Is it worth starting now, even if we only document this one difference and add to it over time? This could either be in our existing man page and/or in the official LLVM docs.

Yeah we need a documentation.

I made these alignment changes because of https://reviews.llvm.org/D74286#1872329 .

I thought about the alternative to simply let ALIGN override max_input_align. After rereading https://bugs.llvm.org/show_bug.cgi?id=45023#c9 , I give up that idea because .output ALIGN(8) : {} can override max_input_align.

Now, our discrepancy with GNU ld is:

  • When both an output section address and ALIGN are set, we use max(ALIGN,max_input_align) while GNU ld uses ALIGN. This is an "undefined behavior".
  • In GNU ld>=2.35, an output section address can decrease sh_addralign. This is to conform to the ELF spec but my feeling is that this creates a non-conforming/strange section intentionally. I think we should keep sh_addralign to reflect the max of input sections. This also makes problems more obvious.

These are corner cases and undocumented on It is documented in https://sourceware.org/binutils/docs/ld/ . No correct linker script is affected.

MaskRay updated this revision to Diff 248947.Mar 7 2020, 10:21 AM
MaskRay edited the summary of this revision. (Show Details)
psmith added a comment.Mar 8 2020, 8:39 AM

If the ld.bfd developers consider both [address] and [ALIGN(section_align)] as undefined behaviour then I'm happy that we can simplify and the above change does implement that. Would like to hear what some of the other reviewers think as well.

For the documentation may I suggest we add something like this to ld.lld.1

Linker Script implementation notes and policy.

LLD implements a large subset of the GNU ld linker script notation. The LLD
implementation policy is to implement linker script features as they are
documented in the ld manual https://sourceware.org/binutils/docs/ld/Scripts.html
We consider it a bug if the lld implementation does not agree with the manual
and it is not mentioned in the exceptions below.

The ld manual is not a complete specification, and is not sufficient to build
an implementation. In particular some features are only defined by the
implementation and have changed over time.

The lld implementation policy for properties of linker scripts that are not
defined by the documentation is to follow the GNU ld implementation wherever
possible. We reserve the right to make different implementation choices where
it is appropriate for LLD. Intentional deviations will be documented here:

Linker Script deviations

Status: This is under construction and is not complete.

Output Section ALIGN

When an OutputSection S has both [address] and [ALIGN(section_align)]
LLD will set the alignment of S to the maximum of [ALIGN(section_align)] and
the maximum alignment of the input sections in S. GNU ld will always use
[ALIGN(section_align)].

When an OutputSection S has [address] LLD will warn if this not 0 modulo
OutputSection alignment. The ELF file will set sh_addr to [address] and
sh_addralign to OutputSection alignment even if this would produce a
non-compliant ELF file. GNU ld from Binutils 2.35 onwards will reduce
sh_addralign so that sh_addr = 0 (modulo sh_addralign).

We could add to this as we make more decisions, I've kept to just those made in this review here. We could also add a known limitations for the areas where we haven't implemented corner cases.

MaskRay added a comment.EditedMar 8 2020, 9:29 AM

If the ld.bfd developers consider both [address] and [ALIGN(section_align)] as undefined behaviour then I'm happy that we can simplify and the above change does implement that. Would like to hear what some of the other reviewers think as well.

For the documentation may I suggest we add something like this to ld.lld.1

Linker Script implementation notes and policy.

LLD implements a large subset of the GNU ld linker script notation. The LLD
implementation policy is to implement linker script features as they are
documented in the ld manual https://sourceware.org/binutils/docs/ld/Scripts.html
We consider it a bug if the lld implementation does not agree with the manual
and it is not mentioned in the exceptions below.

The ld manual is not a complete specification, and is not sufficient to build
an implementation. In particular some features are only defined by the
implementation and have changed over time.

The lld implementation policy for properties of linker scripts that are not
defined by the documentation is to follow the GNU ld implementation wherever
possible. We reserve the right to make different implementation choices where
it is appropriate for LLD. Intentional deviations will be documented here:

Linker Script deviations

Status: This is under construction and is not complete.

Output Section ALIGN

When an OutputSection S has both [address] and [ALIGN(section_align)]
LLD will set the alignment of S to the maximum of [ALIGN(section_align)] and
the maximum alignment of the input sections in S. GNU ld will always use
[ALIGN(section_align)].

When an OutputSection S has [address] LLD will warn if this not 0 modulo
OutputSection alignment. The ELF file will set sh_addr to [address] and
sh_addralign to OutputSection alignment even if this would produce a
non-compliant ELF file. GNU ld from Binutils 2.35 onwards will reduce
sh_addralign so that sh_addr = 0 (modulo sh_addralign).

We could add to this as we make more decisions, I've kept to just those made in this review here. We could also add a known limitations for the areas where we haven't implemented corner cases.

Thanks for the documentation! I think the paragraphs may be too long for docs/ld.lld.1. May I suggest you create a patch to add Linker Script implementation notes and policy to docs/ELF/linker_script.rst? I can add Output Section ALIGN in this patch. I don't want to take credit for the important policy part.

MaskRay updated this revision to Diff 249009.Mar 8 2020, 10:01 AM

Add docs/ELF/linker_script.rst

MaskRay edited the summary of this revision. (Show Details)Mar 8 2020, 10:21 AM
psmith added a comment.Mar 9 2020, 1:55 PM

Thanks for the documentation! I think the paragraphs may be too long for docs/ld.lld.1. May I suggest you create a patch to add Linker Script implementation notes and policy to docs/ELF/linker_script.rst? I can add Output Section ALIGN in this patch. I don't want to take credit for the important policy part.

Sure I'll do that tomorrow morning, apologies I didn't get to it today.

Created D75921 to create the initial contents of linker_script.rst
While trying to build it I ran into https://bugs.llvm.org/show_bug.cgi?id=41789 as I had too new a version of Sphinx. The solution in the PR worked for the LLD docs, although the overall LLVM docs build failed for a different (deprecation of something in latest Sphinx) in an unrelated file. If this happens to you I suggest setting SPHINX_EXECUTABLE=/path/to/pre_2.0_sphinx_build in your cmake command.

I wholeheartedly endorse documenting the policy on linker script semantics. I think the described deviation from GNU semantics for cases where explicit output section alignment doesn't jibe with input section alignment is generally fine. That is, I think this corner case is best avoided and I'm not too concerned with exactly how it goes when it happens. The warning for the case of an explicit address that is misaligned wrt input section sh_addralign is the most important thing for avoiding accidents. For the explicit alignment being less than than input requirements, the lld behavior of using the maximum of the two seems sensible to me (essentially the explicit alignment in the linker script is just another of the requirements like all the inputs' sh_addralign values and you align to the maximum of the whole set, which is perfectly intuitive). If anything, I think the GNU linkers should warn about violating input sh_addralign requirements if they're going to have that behavior.

In my case, the change in behavior for the explicit alignment < input alignment case revealed a bug in my code producing the input sections and showed me that I really wanted to reorganize my linker script so it was testing the input section alignment with asserts rather than controlling it explicitly in the output section. So if anything that suggests warning for any instance of an explicit alignment not matching up with input requirements. But it seems likely that the most common use of explicit alignments is exactly when setting the minimum alignment of the output section is the intent and that being increased by input sh_addralign requirements is normal and expected, and not a case like mine.

psmith accepted this revision.Mar 11 2020, 4:59 AM

Now that we have some documentation in place to record the decision, and I've not seen any objections to the simplification, I'm happy to approve the change.

This revision is now accepted and ready to land.Mar 11 2020, 4:59 AM
This revision was automatically updated to reflect the committed changes.