This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Symbol assignment within input section list
ClosedPublic

Authored by evgeny777 on Jul 22 2016, 8:27 AM.

Details

Reviewers
ruiu
davide
Summary

Allows the following script to proceed:

SECTIONS {
   . foo : { begin_foo = .; *(.foo) end_foo = .; }
}

Diff Detail

Event Timeline

evgeny777 updated this revision to Diff 65082.Jul 22 2016, 8:27 AM
evgeny777 retitled this revision from to [ELF] Symbol assignment within input section list.
evgeny777 updated this object.
evgeny777 added a reviewer: ruiu.
evgeny777 set the repository for this revision to rL LLVM.
evgeny777 added a project: lld.
evgeny777 added subscribers: grimar, ikudrin, emaste.
ruiu added inline comments.Jul 23 2016, 4:17 AM
ELF/LinkerScript.cpp
551–553

Why don't you create absolute symbols here? I think it will simplifies the code.

842

I think SymbolAssignment is copy-assignable. You want to create a new instance of Assignment here and copy-assign to eliminate use of std::unique_ptr. (It's inefficient but it's really marginal.)

evgeny777 added inline comments.Jul 25 2016, 3:46 AM
ELF/LinkerScript.cpp
551–553

My understanding is that symbols defined inside output section definition should have values relative to section base address, so relocation could be applied for them. However symbols defined inside SECTIONS {} block do not belong to any output section, so they have absolute values.

842

Not sure what you mean. We store commands in a form of array of smart pointers to values (not values themselves): std::vector<std::unique_ptr<BaseCommand>>

So why do we need copy assignment here?

evgeny777 updated this revision to Diff 65324.Jul 25 2016, 3:57 AM
evgeny777 added a reviewer: davide.
evgeny777 removed rL LLVM as the repository for this revision.

Adding Davide to reviewers.

Davide, it looks like you handle PROVIDE (and PROVIDE_HIDDEN) inside output section definition just the same way as PROVIDE inside SECTIONS block, adding absolute values. To my understanding this is not correct, because such symbols should have values relative to section base address. Also location counter changes, while input sections are added to output section and your patch seems to ignore this, always using the value of 'Dot'.

Also there were the following issues in linkerscript-provide-in-section test case

a) Section name mismatch (.blah != blah)
b) SHF_ALLOC attribute missing on .blah
c) It looks like llvm-mc does not support -relocation-model attribute any longer, so now executable is being generated instead of shared object

davide edited edge metadata.Jul 25 2016, 9:19 AM

Adding Davide to reviewers.

Davide, it looks like you handle PROVIDE (and PROVIDE_HIDDEN) inside output section definition just the same way as PROVIDE inside SECTIONS block, adding absolute values. To my understanding this is not correct, because such symbols should have values relative to section base address. Also location counter changes, while input sections are added to output section and your patch seems to ignore this, always using the value of 'Dot'.

Yes, I was thinking the same, and writing a patch. You beat me to the punch apparently, so, I'll just review it.

Also there were the following issues in linkerscript-provide-in-section test case

a) Section name mismatch (.blah != blah)
b) SHF_ALLOC attribute missing on .blah
c) It looks like llvm-mc does not support -relocation-model attribute any longer, so now executable is being generated instead of shared object

I fixed a). Are b) and c) really problems?

davide accepted this revision.Jul 25 2016, 9:22 AM
davide edited edge metadata.

This seems OK, but please wait for Rui for final sign-off. Sorry if I missed this (there's enough activity on linker scripts recently that's hard to keep track of all the things going on). I recommend you to cc: me on you next patches to avoid this kind of issues. I'll do the same.

This revision is now accepted and ready to land.Jul 25 2016, 9:22 AM
ruiu edited edge metadata.Jul 25 2016, 2:09 PM

createSections and addScriptedSymbols are already too long, I don't want to add more code to that function. They need to be simplified before adding features.

ruiu added a comment.Jul 25 2016, 4:09 PM

I've split createSections, so please rebase. Sorry for your inconvenience.

evgeny777 updated this revision to Diff 65480.Jul 26 2016, 2:12 AM
evgeny777 edited edge metadata.

Rebased.
Rui, your getSectionMap() is not very convenient, when you want to deal with multiple types of objects, and for this patch I need both
InputSectionDescription (to update section offset) and SymbolAssignment (to save offset) in the same iteration.
I believe that visitor is more handy here.

ruiu added a comment.Jul 26 2016, 12:18 PM

Not sure if the visitor pattern is a good choice, but there's a correctness issue with this patch so I'll send this comment now.

ELF/LinkerScript.cpp
120–121

This offset computation is not accurate. On MIPS or ARM, input sections may have additional pieces of data, or thunks, at their ends. Thunks are added to input sections in Writetr::finalizeSections, so in this function we don't know the exact size of input sections. So this is computing offsets too early.

ruiu added a comment.Jul 26 2016, 12:40 PM

By the way, can you give me a few sample linker scripts that depend on this feature?

Linking our proprietary micro kernel depends on this feature, but I can't share any details, of course

BTW, FreeBSD also uses it:

https://svnweb.freebsd.org/base/head/sys/conf/ldscript.amd64?revision=284870&view=markup#l100

evgeny777 added inline comments.Jul 26 2016, 1:38 PM
ELF/LinkerScript.cpp
120–121

Thanks for pointing this out. I suggest calculating offsets in addScriptedSymbols than. What's your opinion?

ruiu added inline comments.Jul 26 2016, 1:41 PM
ELF/LinkerScript.cpp
120–121

If you can do it, it would work, but I"m not sure how you are going to do it.

evgeny777 added inline comments.Jul 26 2016, 1:59 PM
ELF/LinkerScript.cpp
120–121

I think this can be done by implementing a special sort of input section (like SymbolInputSection<ELFT>) with zero size and alignment of 1. One can store a list of such sections in LinkerScript<ELFT> and calculate real offset, taking SymbolInputSection<ELFT>::OutSec as a starting point.

I believe there should be other ways also. Need some research.

ruiu added inline comments.Jul 26 2016, 2:09 PM
ELF/LinkerScript.cpp
120–121

I haven't thought hard enough yet, but it sounds at least not a bad idea.

The other idea (I'm not suggesting it's better than yours) would be to create DefinedRegular symbols instead of DefinedSymthetic symbols. Synthetic symbols are associated to output sections, while regular symbols are associated to input sections, so the linker don't need to adjust offsets later.

evgeny777 updated this revision to Diff 65753.Jul 27 2016, 9:07 AM

Review updated.

Changes:

  1. If section contains only symbols those symbols are added as absolute. This is what gold linker does,

and also allows to pass Davide test (linkerscript-provide-in-section.s)

  1. Symbols declared inside output section definition are added as virtual input sections to corresponding output sections.

After assignOffsets() is called from Writer::finalizeSections(), those virtual section offsets (OutSecOff) are equal to values of dot symbol.

evgeny777 added inline comments.Jul 27 2016, 9:42 AM
ELF/LinkerScript.cpp
162

AddPendingSyms() adds symbols to current output section, clearing the pending state.
This happens in two cases:
a) Before we add non-virtual input section to output section, but after output section was created in call to Factory.create()
b) After we processed all input section commands. The list of pending symbols may not be empty, if symbols are declared in the end of section description, for example 'end_foo' symbol:

SECTIONS { .foo : { *(.foo) : end_foo = .; } }
ELF/LinkerScript.h
63

This is no longer needed, I'll remove in next iteration

ruiu added inline comments.Jul 27 2016, 2:19 PM
ELF/LinkerScript.cpp
409–410

Now you have a dummy input section. Is there any reason to use Synthetic symbols instead of Regular symbols?

813–816

Why do you need to handle this case as a special case?

821–822

readAsssignment will never return a null pointer, so you can remove if.

ELF/LinkerScript.h
155

It seems you are storing only unique_ptr<SymbolInputSection> to this vector. Why don't you use SymbolInputSection instead of InputSectionBase?

ELF/Writer.cpp
736

People who are reading this code for the first time don't know what assignOffset did and what the following function is going to do, so writing only an exception wouldn't help them. Instead please write about what this is going to do.

737

So, does this work? Adding new symbols changes the size of .symtab and .strtab, no?

evgeny777 added inline comments.Jul 27 2016, 2:38 PM
ELF/LinkerScript.cpp
409–410

I think its possible to add regular symbol to this fake input section and this might work as well. But is this any easier?

813–816

You can have script like this:

SECTIONS { .dummy { a = .; b = . + 2; } }

There are no input sections, so output section will never be created, so you can't add symbols anywhere.

ELF/LinkerScript.h
155

Just because I wanted to keep this class local to LinkerScript without any forward declarations in header file. But can change this of course.

ELF/Writer.cpp
737

You're right, this is a bug. But I think this can be fixed easily by calling addSynthetic in createSections(), and setting symbol values in addSyntheticSymbols().
If the whole approach is okay and there is no need for radical changes then I'll certainly do this.

ruiu added inline comments.Jul 27 2016, 2:44 PM
ELF/LinkerScript.cpp
409–410

I guess it makes things easier because if you attach symbols to input sections, their final offsets are automatically adjusted as input sections get final offsets. That means you can add symbols earlier.

813–816

Is it a real use case? I'm inclined not to add code to handle special cases.

ELF/LinkerScript.h
155

I think adding a forward declaration would be better.

evgeny777 added inline comments.Jul 27 2016, 2:48 PM
ELF/LinkerScript.cpp
813–816

I don't know. This mostly addresses Davide test case (linkerscript-provide-in-sections.s). This is also what gold does in such cases.
If you want to remove this, is it ok to remove the test case then?

ruiu added inline comments.Jul 27 2016, 2:50 PM
ELF/LinkerScript.cpp
813–816

Yeah, please remove it from this patch. We can add it later if we actually need it.

evgeny777 added inline comments.Jul 28 2016, 4:38 AM
ELF/LinkerScript.h
155

I tried it, but looks like forward declaration is not enough, because elements are wrapped inside unique_ptr<T>. To my understanding there are two alternatives:
a) Use raw pointers instead of unique_ptr<T>
b) Put SymbolInputSection<ELFT> declaration to LinkerScript.h (or even InputSection.h). This means explicit template instantiations should be added as well.

If this is really needed please suggest the correct way of implementation.

evgeny777 updated this revision to Diff 65912.Jul 28 2016, 4:49 AM
evgeny777 set the repository for this revision to rL LLVM.

Review updated.

ruiu added inline comments.Jul 28 2016, 1:14 PM
ELF/LinkerScript.cpp
46

val -> Val

47

This is more straightforward.

memset(Val, 0, sizeof(*Val);
return Val;
66

Remove the blank line.

67

It's a bit misleading. This section is written, but because it is empty, there's nothing to be written in reality.

179–182

I do not understand why you needed a concept of "pending" here. You can add a SymbolInputSection as soon as you see a new SymbolAssignment, can't you?

694

REmove this variable and directly add to Opt.Commands.

789

Remove this variable and directly add to Cmd->Commands.

ELF/LinkerScript.h
152–153

I think we name this kind of thing OwningSections.

evgeny777 added inline comments.Jul 28 2016, 1:22 PM
ELF/LinkerScript.cpp
67

Not exactly. If you don't set SHT_NOBITS, lld will *attempt* to write the section and fail because file pointer is set to null.

179–182

Imagine the symbol is declared before any input section rule, like in the sample below:

.foo : { begin_foo = .; *(.foo) }

Which output section will own it? You can't create output section given symbol assignment command, because you know only section name, not section attributes, which can be obtained only from InputSectionBase<ELFT>.

rafael added inline comments.Jul 29 2016, 9:13 AM
ELF/LinkerScript.cpp
147

Sorry, it is not clear why you need a new section type, it looks like you can always assign the symbol to an existing input or output section, no?

The general case you will have something like

foo_and_bar : {

begin_foo = .; *(.foo*) end_foo = .;
begin_bar = .; *(.bar*); end_bar = .;

}

If there is any input section following a symbol, associate the symbol with the input section with an offset of 0.
If there is no input section following a symbol, associate the symbol with the end of the output section.

Interesting idea, but it doesn't sound any easier to me. You'll likely maintain list of symbols instead of list of SymbolInputSection<ELFT>. Also will it work in this case:

.foo {
    *(.foo)
    end_foo = .;
   . = ALIGN(0x1000);
}
rafael added inline comments.Jul 29 2016, 9:46 AM
ELF/LinkerScript.h
152

I don't think Relative is the correct terminology.

In .o files a symbol is relative to a section.
In a .so or executable a symbol value is a virtual address. In fact, the entire section table is optional. Using an actual section number instead of ABS is just being human friendly.

rafael added inline comments.Jul 29 2016, 9:56 AM
ELF/LinkerScript.cpp
67

Include something like that in the comment.

71

This logic is duplicated with addAbsoluteSymbols. We probably need a predicate.

85

This is a bit odd.

The way normal DefinedRegular symbols are handled is to have a Value that is the offset in the input section. In this case, since the input section is a dummy empty section that should always be 0.

It is the offset of the input section in the output section that we should be computing.

evgeny777 added inline comments.Jul 29 2016, 10:15 AM
ELF/LinkerScript.cpp
85

What about this expression? Will this work, if you always set Value to 0?

a = ALIGN(8);
rafael added inline comments.Jul 29 2016, 10:15 AM
ELF/LinkerScript.cpp
695

The part about parsing assignment, PROVIDE and PROVIDE_HIDDEN is duplicated with readSections. Can that be refactored?

evgeny777 updated this revision to Diff 66319.Aug 1 2016, 7:31 AM
evgeny777 removed rL LLVM as the repository for this revision.

Review updated.
Added helper class OutputSectionBuilder to reduce size of LinkerScript::createSections

ruiu added a comment.Aug 1 2016, 5:09 PM

Sorry to bring this back again, but I'm wondering why we can't create absolute symbols. This is different from what GNU linkers do, but looks like difference between section-relative symbols and absolute symbols don't matter. If we create absolute symbols, we can remove the virtual input section class.

ELF/LinkerScript.cpp
278

Why did you have to make a change to this function?

I think the main reason, we're using virtual input sections is that this the only way to calculate correct symbol offset. As you may know location counter is not incremented while we add input sections to output section, and the true size of input sections is known only after call to OutputSectionBase<ELFT>::assignOffsets().

So if you suggest an algorithm, which can calculate correct symbol value (w/o using virtual input sections) in the case below:

.foo : { *(.foo); end_foo = .; *(.bar) }

then we can probably switch to absolute symbols (BTW we can also use synthetic symbols - there is a little difference, if any).
Another interesting question is what will happen if we define absolute symbol in shared object and reference it in executable? For example:

/* script for linking shared library */
SECTIONS { .text : { text_start = .; *(.text) } }

So, when shared library is loaded by application, what value would text_start have, in case it is absolute? I don't know yet, but will try.

ELF/LinkerScript.cpp
278

Two main reasons:

  1. During filtering process some output sections may be removed. Those sections may contain symbols and SymbolInputSection object have already been created for them. To avoid crashes and/or creating dummy symbols I have to remove those virtual sections as well
  1. The old implementation is not technically correct, because it removes only first output section found in name lookup. We're still using OutputSectionFactory<ELFT>, so we may have several sections with the same name.

Another reason (though much less significant) is that one-by-one removal from std::vector must be slow, because it stores elements on continuous region of memory.

grimar added a comment.Aug 2 2016, 4:30 AM
In D22683#502851, @ruiu wrote:

Sorry to bring this back again, but I'm wondering why we can't create absolute symbols. This is different from what GNU linkers do, but looks like difference between section-relative symbols and absolute symbols don't matter. If we create absolute symbols, we can remove the virtual input section class.

I think one of the reasons can be:
(https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/expressions.html)
"The position of the expression within the linker script determines whether it is absolute or
relative. An expression which appears within an output section definition is relative to the
base of the output section. An expression which appears elsewhere will be absolute.

A symbol set to a relative expression will be relocatable if you request relocatable output
using the ‘-r’ option. That means that a further link operation may change the value of
the symbol. The symbol’s section will be the section of the relative expression"

For absolute symbols there is ABSOLUTE() keyword:
ABSOLUTE(exp)
Return the absolute (non-relocatable, as opposed to non-negative) value of the
expression exp. Primarily useful to assign an absolute value to a symbol within
a section definition, where symbol values are normally section relative.

ELF/Writer.cpp
714

Missing dot at the end of last sentence.

evgeny777 updated this revision to Diff 66649.Aug 3 2016, 5:44 AM

Review updated.

  • createThunks() is called before createSections()
  • Output section size and input section offset are calculated in addSection()
  • assignOffsets made private and called only in sortCtorsDtors() and sortInitFini().
evgeny777 updated this revision to Diff 66828.Aug 4 2016, 10:46 AM

Review updated.
Rebased + merged addSymbolToSymtabAux (suggestion from rafael)

rafael added inline comments.Aug 4 2016, 12:45 PM
ELF/LinkerScript.cpp
47

You don't need the inline.

52

Or here .

59

Cmd can be a reference.

204

Please keep the name Cmd instead of I.

206

You are reverting Rui's patch in here. Please assign getInputSecitons to the Sections vector.

evgeny777 added inline comments.Aug 4 2016, 1:02 PM
ELF/LinkerScript.cpp
47

Just curious - why?

evgeny777 updated this revision to Diff 66854.Aug 4 2016, 1:53 PM

Diff updated.

ruiu added a comment.Aug 4 2016, 4:56 PM

I'm still thinking about the best way to support it, and I feel like there's a way, but I think it's better to land this. We can improve it later if needed.

There's a question for you. How are you going to support assignments to "." inside output section descriptors?

ELF/OutputSections.cpp
845–847

Okay, so you seems to call assignOffsets() in sortInitFini because we need to assign offsets after sorting input sections. But if you sort input sections, the total output section size may change, because padding sizes to satisfy alignment requirements may change.

I think you wan tto compute the total size of sections inside assignOffsets. I mean, you can make this function really just add input section to the vector and does nothing more than that, and let assignOffsets compute the alignment and the output section size.

894

If there's no .init or .ctors, assignOffsets is not called? Is it correct?

If we calculate input section offset in addSection() and do not change it later then assignment to "." should be easy. I think, its enough to add a parameter to addSection, like this

template <class ELFT>
void OutputSection<ELFT>::addSection(InputSectionBase<ELFT> *C, uintX_t Offset) {
  
  // ... add section to Sections vector and set OutSec.
  
  uintX_t Off = alignTo(this->Header.sh_size + Offset, S->Alignment);
  S->OutSecOff = Off;

  // ... calculate size
}

I haven't tried it yet, but at a first glance this should work

ELF/OutputSections.cpp
845–847

Doing so will complicate adding symbols, because you don't know symbol offset when you process it in LinkerScript<ELFT>::createSections.
We don't want to switch back to virtual input section, AFAIK, so I wonder ho do you suggest handling symbols this time.

BTW, please look at this review:
https://reviews.llvm.org/D23197

It refactors sorting of special sections.

894

I think so. We call this method only if Factory.lookup() return us a section

evgeny777 updated this revision to Diff 67179.Aug 8 2016, 9:12 AM

Review updated.
Added support for "." assignment within input section list. Now the following script can be successfully parsed and processed:

SECTIONS {
   .foo : { 
        * (.foo)
        sizeof_foo_1 = SIZEOF(.foo); /* sizeof_foo_1 is equal to sizeof( *.foo ) */
       * (.bar)
       sizeof_foo_2 = SIZEOF(.foo); /* sizeof_foo_2 is equal to sizeof ( *.foo ) + sizeof ( *.bar) */
       . = ALIGN(0x1000); /* The section ends on page boundary */ 
   }
}

This patch uses virtual input sections (once again, yeah), but they're created and processed inside LinkerScript<ELFT>, without any change introduced to base class InputSection<ELFT>. The reason is that with "." assignments real section size depends on VA and can only be calculated in assignAddresses(). For example sizes of .foo and .bar output sections in example below will be different:

SECTIONS {
    . = 0x500;
    .foo {  . = ALIGN(0x1000);  }  /* size of .foo is 0x1000 - 0x500 */        
}
SECTIONS {
    . = 0xA00;
    .bar {  . = ALIGN(0x1000);  }  /* size of .bar is 0x1000 - 0xA00 */        
}

Please also check the test case

ruiu added a comment.Aug 8 2016, 2:41 PM

Let's submit this soon. After submitting this, we could probably simplify things, but this patch provides a real value and is a good start. But before doing this, I think this needs to be fixed...

ELF/LinkerScript.cpp
186

Repl is not expected to be null, so it seems dangerous. Why don't you just leave as is?

327

Use cast instead of static_cast.

330

Ahh, so you are using the absence of Repl as a marker of your input section. It's a bad use of this field. Please just define a new Kind and use it.

evgeny777 updated this revision to Diff 67306.Aug 9 2016, 2:52 AM

Review updated.
It looks like OutputSectionBase derived classes are ordinary C++ polymorphic types and do not currently support LLVM-style casting.
That's why I used getSection virtual method, instead of cast<> (see assignOffsets in LinkerScript.cpp)

Rui, it was not very clear from your latest comment if this patch can be landed or you want to hold it on for further review. Could you please clarify? Thanks.

evgeny777 updated this revision to Diff 67565.Aug 10 2016, 11:35 AM

Review updated. The assignOffsets() now uses LLVM-style casting (https://reviews.llvm.org/D23351)

ruiu accepted this revision.Aug 10 2016, 6:18 PM
ruiu edited edge metadata.

LGTM

ELF/LinkerScript.cpp
140

I don't think it's a good idea to piggyback on InputSection<ELFT> because LayoutInputSection is (and should be) fairly different from InputSection. LayoutSection has no relocation information, no data, no thunks, etc.

In general, please try to make class hierarchy shallower if possible. I don't think you need to address it now, but I'll take a look after it's submitted.

339

Use auto.

347

Use auto.

359–361

Do you have to do this in this for-loop? It seems that you can do this only once after this for-loop.

evgeny777 added inline comments.Aug 11 2016, 1:05 AM
ELF/LinkerScript.cpp
359–361

I'm evaluating symbols in this for-loop, so I need to have actual size, in case there is SIZEOF() in expression, i.e

.foo : {
   *(.aaa)
    size_1 = sizeof(.foo); /* size_1 is sizeof (*.aaa) */
   *(.bbb)
    size_2 = sizeof(.foo); /* size_2 is sizeof (*.aaa) + sizeof (.bbb) */
}
evgeny777 closed this revision.Aug 11 2016, 5:38 AM