This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Allow going over alignment commands in algorithm of placing non-alloc orphans.
AbandonedPublic

Authored by grimar on Feb 2 2017, 7:36 AM.

Details

Reviewers
ruiu
rafael
Summary

SECTIONS command in a linker script may not cover all possible input sections. If there are remaining
input sections after processing SECTIONS commands, they will be put into the output using default rules.
Such sections are called orphan sections.

The default rules are not strictly defined -- that is a set of heuristics. This patch tweaks the default rules a bit
so that LLD can link the Linux kernel.
(http://lxr.free-electrons.com/source/arch/x86/boot/setup.ld#L52)

Value ot _end symbol incorrectly. That was reason of "Setup too big!" assertion fail.
We placed orphans non allocatable sections before _end and so far _end value was too large.

Patch changes placement rules for non allocatable orphans, that way them are alowed to be placed
after assignments to location counter. As a result them are placed after assignment to symbols now
and script above works correctly now.

Diff Detail

Event Timeline

grimar created this revision.Feb 2 2017, 7:36 AM
ruiu edited edge metadata.Feb 2 2017, 10:57 AM

Please update the commit message to not use that many examples. In general, you want to explain the problem you are fixing and how your patch fixes that, instead of giving examples and let readers make a guess what you are doing.

For example, I'd probably start explaining this patch like this: "SECTIONS command in a linker script may not cover all possible input sections. If there are remaining input sections after processing SECTIONS commands, they will be put into the output using default rules. Such sections are called orphan sections. The default rules are not strictly defined -- that is a set of heuristics. This patch tweaks the default rules a bit so that LLD can link the Linux kernel." And then you want to explain the problem and how you fix it.

ELF/LinkerScript.cpp
686–689

This is the linker script you are trying to handle.

. = ALIGN(16);
.bss            :
{
  __bss_start = .;
  *(.bss)
  __bss_end = .;
}
. = ALIGN(16);
_end = .;

It doesn't seem to be related to SHF_ALLOC. I wonder if you want to skip a pair of an alignment and an assignment to a symbol instead.

grimar added inline comments.Feb 2 2017, 11:42 AM
ELF/LinkerScript.cpp
686–689

I think comment above says why we can not do that:

We don't want to go over alignments for SHF_ALLOC sections, since doing so in
rx_sec : { *(rx_sec) }
. = ALIGN(0x1000);
/* The RW PT_LOAD starts here*/
// rw_sec : { *(rw_sec) }

So if we just skip pair, case above will be broken.
Behavior of placing orphans is not specified, but at least what we implemented looks worked and what this
patch adds is what at least some scripts relies on. Since bfd behaves in that way.

ruiu added a comment.Feb 2 2017, 4:06 PM

I spent a fair amount of time to try to understand this, but this still looks really odd. My understanding is that, adding non-SHF_ALLOC sections doesn't change the current virtual address pointer (the '.' special variable), because non-allocatable sections are not mapped to the memory at runtime.

IIUC, what you wrote is ASSERT(_end <= 0x8000) in http://lxr.free-electrons.com/source/arch/x86/boot/setup.ld fails with LLD because LLD adds non-alloc sections after .bss but before _end symbol. But that's odd, because adding non-alloc sections shouldn't increment . variable.

What am I missing here? I would really appreciate if you describe it without too much examples, but why you think what you are doing is semantically correct.

grimar added a comment.Feb 3 2017, 2:14 AM
In D29453#665381, @ruiu wrote:

I spent a fair amount of time to try to understand this, but this still looks really odd. My understanding is that, adding non-SHF_ALLOC sections doesn't change the current virtual address pointer (the '.' special variable), because non-allocatable sections are not mapped to the memory at runtime.

It actually changes it, explanation below.

IIUC, what you wrote is ASSERT(_end <= 0x8000) in http://lxr.free-electrons.com/source/arch/x86/boot/setup.ld fails with LLD because LLD adds non-alloc sections after .bss but before _end symbol. But that's odd, because adding non-alloc sections shouldn't increment . variable.

What am I missing here? I would really appreciate if you describe it without too much examples, but why you think what you are doing is semantically correct.

I will try :) Little single sample:

If we have a script with _end after all allocatable sections and _end2 after single non-allocatable:

SECTIONS { .text : { *(.text*) }   _end = .;    .nonalloc: { *(.nonalloc*) }    _end2 = .;  }

You assume that non-alloc should not increment counter. But both BFD/gold output would be next:

[ 1] .text             PROGBITS         0000000000000000  00001000
     0000000000000001  0000000000000000  AX       0     0     4
[ 2] .nonalloc         PROGBITS         0000000000000000  00001001
     0000000000000008  0000000000000000           0     0     1

   5: 0000000000000009     0 NOTYPE  GLOBAL DEFAULT    1 _end2
   6: 0000000000000001     0 NOTYPE  GLOBAL DEFAULT    1 _end

Symbols has different values, dot variable is a location counter, it is not the same as VA, though behaves close.
Non-alloc sections affect it value too.
And we already did that (naturally , I believe). I see no reasons why we should try to
behave in different way than gnu linkers here ?

grimar edited the summary of this revision. (Show Details)Feb 3 2017, 2:20 AM
grimar edited the summary of this revision. (Show Details)
ruiu added a comment.Feb 3 2017, 1:10 PM

I did not ask you to verify that the behavior matches with GNU linkers (I was pretty sure that you already did.) What I asked is to reason about it. Can you explain why you think that is semantically correct?

grimar added a comment.Feb 4 2017, 1:05 AM
In D29453#666226, @ruiu wrote:

I did not ask you to verify that the behavior matches with GNU linkers (I was pretty sure that you already did.)

Though I think it is important. If some scripts already relies on this behavior, it is better to follow than invent own unique behavior,
incompatible or partially compatible with existent software.

What I asked is to reason about it. Can you explain why you think that is semantically correct?

Reason is that such behavior is clear and understandable. When you write script:

SECTIONS { 
. = 0x1000;
.text : { *(.text) };
/* all other sections you know about */
. = ALIGN(X);
End = .; 
}

I think you expect End to point and indicate the end of sections you aware of. And linker can create other ones, non-allocatables, even those you do not ask for,
like ".comment". And writing this script you do not have to care about that all. That is pretty expectable user-friendly behavior, no ?

grimar updated this revision to Diff 87457.Feb 7 2017, 8:33 AM
  • Simplified testcase.
grimar abandoned this revision.Mar 30 2017, 1:30 AM