This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Implemented PROVIDE linker script command.
AbandonedPublic

Authored by grimar on Apr 16 2016, 9:44 AM.

Details

Reviewers
ruiu
rafael
Summary

Patch implements PROVIDE command. It is often used in scripts I saw.

Description of PROVIDE (taken from https://www.sourceware.org/binutils/docs-2.10/ld_3.html#SEC17):
In some cases, it is desirable for a linker script to define a symbol only if it is referenced and is not defined by any object included in the link. For example, traditional linkers defined the symbol etext'. However, ANSI C requires that the user be able to use etext' as a function name without encountering an error. The PROVIDE keyword may be used to define a symbol, such as `etext', only if it is referenced but not defined. The syntax is PROVIDE(symbol = expression).

Diff Detail

Event Timeline

grimar updated this revision to Diff 53990.Apr 16 2016, 9:44 AM
grimar retitled this revision from to [ELF] - Implemented PROVIDE linker script command..
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar added subscribers: llvm-commits, grimar.
ruiu added inline comments.Apr 18 2016, 11:41 AM
ELF/LinkerScript.cpp
170–176

I'd make the symbol table visible from the linker script in order to eliminate this plumbing between LinkerScript and Writer.

grimar updated this revision to Diff 54184.Apr 19 2016, 6:23 AM
  • Addressed review comments.
rafael added inline comments.Apr 19 2016, 6:45 AM
test/ELF/linkerscript-provide.s
50

Are these symbols produced as absolute by bfd and gold?

It would be really nice if they could be produced with an output section and offset. The advantaged is that offsets can be computed earlier. With that some day we could cleanup relocation processing further by representing relocation target as OutputSection + Offset. That way we would not need enums just to select getVA, getGotVA, getPltVA, etc.

grimar added inline comments.Apr 19 2016, 6:55 AM
test/ELF/linkerscript-provide.s
50

bfd and gold disagree here:

PROVIDE (begin_ext = .);
.plus : 
{
 PROVIDE (begin_sec = .);
 *(.plus) 
 PROVIDE (end_sec = .);
}
PROVIDE (end_ext = .);

bfd:

 Symbol {
  Name: begin_ext (1)
  Value: 0x1000
  Size: 0
  Binding: Global (0x1)
  Type: None (0x0)
  Other: 0
  Section: .plus (0x1)
}
Symbol {
  Name: end_ext (18)
  Value: 0x1008
  Size: 0
  Binding: Global (0x1)
  Type: None (0x0)
  Other: 0
  Section: .plus (0x1)
}
Symbol {
  Name: end_sec (26)
  Value: 0x1008
  Size: 0
  Binding: Global (0x1)
  Type: None (0x0)
  Other: 0
  Section: .plus (0x1)
}
Symbol {
  Name: begin_sec (34)
  Value: 0x1000
  Size: 0
  Binding: Global (0x1)
  Type: None (0x0)
  Other: 0
  Section: .plus (0x1)
}

gold:

  Symbol {
  Name: begin_ext (8)
  Value: 0x1000
  Size: 0
  Binding: Global (0x1)
  Type: None (0x0)
  Other: 0
  Section: Absolute (0xFFF1)
}
Symbol {
  Name: begin_sec (18)
  Value: 0x1000
  Size: 0
  Binding: Global (0x1)
  Type: None (0x0)
  Other: 0
  Section: .plus (0x1)
}
Symbol {
  Name: end_ext (28)
  Value: 0x1008
  Size: 0
  Binding: Global (0x1)
  Type: None (0x0)
  Other: 0
  Section: Absolute (0xFFF1)
}
Symbol {
  Name: end_sec (36)
  Value: 0x1008
  Size: 0
  Binding: Global (0x1)
  Type: None (0x0)
  Other: 0
  Section: .plus (0x1)
}

If at all possible my preference would be:

  • Require PRODIVE to be in an unambiguous location: inside an

OutputSection definition.

  • Produce DefinedSynthetic symbols for them so that the section is set

correctly.

Cheers,
Rafael

First will be incompatible with some existing bsd scripts, for example:
https://svnweb.freebsd.org/base/head/sys/conf/ldscript.amd64?view=markup
It has
PROVIDE (start_ctors = .);
..
PROVIDE (stop_ctors = .);
declared outside output sections definition.
Would it be acceptable to require that ?

George.


llvm-commits mailing list
llvm-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

ruiu added inline comments.Apr 19 2016, 10:31 AM
ELF/LinkerScript.cpp
129–132

Can you make SymbolTable a member of LinkerScript and directly call addAbsolute here? Then I think you can remove ElfSym<ELFT>::Provide.

emaste added a subscriber: emaste.EditedApr 19 2016, 11:14 AM

First will be incompatible with some existing bsd scripts, for example:
https://svnweb.freebsd.org/base/head/sys/conf/ldscript.amd64?view=markup
It has
PROVIDE (start_ctors = .);
..
PROVIDE (stop_ctors = .);
declared outside output sections definition.
Would it be acceptable to require that ?

In the case of (at least) start_ctors and stop_ctors the script is more clear with them inside of the .ctors : { } so it's reasonable for FreeBSD to move them.

I do wonder how common this will be in practice across arbitrary 3rd party software (that does not have a vested interest in linking with lld), but it seems reasonable to proceed with Rafael's suggestion for now and revisit later if necessary.

grimar updated this revision to Diff 54367.Apr 20 2016, 7:30 AM
  • Addressed review comments.

Changed addAbsolute to addSynthetic as was suggested by Rafael. That allows to add provide symbols early without updating symbols values during assignaddress.
I introduced next limitation for that: provide expression can't be anything except assign location counter. i.e only "xxx = ." is valid. And since
only provides inside output description sections are now allowed, that helps to avoid storing symbols in array for futher modification.
And it still covers the needs of script files I saw, so it should be reasonable simplification.

grimar added inline comments.Apr 20 2016, 7:31 AM
ELF/LinkerScript.cpp
129–132

I think I can not directly call addAbsolute from LinkerScript::assignAddresses() because
would catch undefined symbol error then if symbol is referenced.
I need this symbols to be added before symbol resolution and that is why had to add
LinkerScript::createSymbols() method.

But I was able to get rid of ElfSym<ELFT>::Provide after addressing Rafael's suggestion, I also
added some limitations (please read latest patch changelist) that allows
to simplify the code.

grimar updated this revision to Diff 54369.Apr 20 2016, 7:36 AM
  • Reformated testcase.
rafael edited edge metadata.Apr 20 2016, 12:47 PM

I think this is fine, but since it is linker script please wait for Rui's final check.

ELF/LinkerScript.cpp
96–97

Using ArrayRef is a nice independent cleanup. Please commit it first.

171

This only handles symbols at the start and end of output sections, right?

I guess that is a reasonable restriction for now. We can add support afterwards for recording an offset.

ruiu added inline comments.Apr 20 2016, 1:29 PM
ELF/LinkerScript.cpp
180–181

Move this inside the for loop to make their scopes as narrow as possible.

ELF/LinkerScript.h
46–49

ContentKind is very confusing and seems unnecessary. Why don't you process ProvideKind elements in assignAddresses? You should be able to define symbols using the current VA in the function.

grimar marked 2 inline comments as done.Apr 21 2016, 5:43 AM
grimar added inline comments.
ELF/LinkerScript.cpp
96–97

Done. r266978

171

Yes. And probably we will not need adding anything else here. I never saw in linkerscripts places where
PROVIDE inside output section description was attached to something different than end/start.

180–181

I can't do that.
SecName is used to keep what section we currently processing (addSynthetic requires section as argument).
Val value will change and is different for start of the section and end. And since ProvideKind can be
somewhere between/after ContentKind command, we should have that value already prepared, as
ProvideKind itself does not let know - is it start or end section symbol.

ELF/LinkerScript.h
46–49

In initial version of this patch, addAbsolute was used, now addSynthetic is. There is no need to save list of symbols and update values in assignAddresses anymore, as they are always attached to begin/end of output section. That is current restriction of the patch discussed above in comments.

Also:

  1. Imagine sample script and code:
.global _start
_start:
  call used1;
SECTIONS
{
  .text :
    {
      PROVIDE (used1 = .);
      *(.text)
    }
}

If I not define symbols in createSymbols(), I`ll get undefined symbol error before reaching assignAddresses().

  1. Imagine another script:
echo SECTIONS
{
  .text : 
  { 
    PROVIDE (beginText = .);
    *(.text)
    PROVIDE (endText = .);
    }
}

Commands list currently would be:

SectionKind
  ProvideKind 
    ContentKind
  ProvideKind

So all ProvideKinds that are before ContentKind are attached to the begin of section, others - to the end. ContentKind is used to actually move VA.

Without ContentKind it would be:

SectionKind
  ProvideKind 
  ProvideKind

How it is possible to know where is start/end provides then ? And when to move VA forward ?

grimar updated this revision to Diff 54489.Apr 21 2016, 5:48 AM
grimar edited edge metadata.
grimar marked 2 inline comments as done.
  • Rebase

As I was reading this patch and comments I had some concerns about the different behavior between bfd and gold and that error messages would not provide enough information. The patch seem to address these concern so I'm happy with this patch.

Since I know too little about the lld source I don't dare to accept revision. But thanks for the great work!

PS: My use case for linker scripts is hobby osdev.

As I was reading this patch and comments I had some concerns about the different behavior between bfd and gold and that error messages would not provide enough information. The patch seem to address these concern so I'm happy with this patch.

Since I know too little about the lld source I don't dare to accept revision. But thanks for the great work!

PS: My use case for linker scripts is hobby osdev.

Thanks for feedback ! For usual errors in script we already report script line + column (D18699),
I think one day we will connect that error reporting system with expression evaluations code somehow, so
finding errors will be easier.

ruiu added inline comments.Apr 21 2016, 2:36 PM
ELF/LinkerScript.h
46–49

If it is currently hard to do to process PROVIDE command as we assign addresses to sections, then something is not right. The way we represent SECTIONS subcommands are probably inappropriate. Building a feature on top of the data structure will things more complicated. I'll fix this first.

ruiu edited edge metadata.Apr 21 2016, 6:29 PM

I spent half a day on poking around LinkerScript and Writer to figure out how to implement SECTIONS sub-commands nicely, and my conclusion is that the current code needs redesigning.

The current design is structurally hard to handle SECTIONS command because both Writer and LinkerScript have partial view of the sections being constructed. Writer creates output sections in Writer::createSections, and LinkerScript is barely involved with that process. However, if linker scripts are in use, then the constructed output sections are passed to LinkerScript to create symbols, layout sections while evaulating linker script expressions. It is hard to do it right because some information are lost between Writer and LinkerScript. It's not just hard to understand, but it also sets a limitation on what we can do with SECTIONS subcommands. For example, this patch can't handle expressions that appear in middle of an output section description. Adding a support for that would be at least tricky.

The right way I think is to make LinkerScript create output sections, instead of using Writer's createSections, if there is a SECTIONS command, so that LinkerScript controls everything about layout. In the new function, we should order sections *and* assign addresses at the same time. As long as sections sizes are fixed by then (this needs some refactoring as well), it should be doable. It is probably the cleanest and simplest way to handle SECTIONS command, as it is going to be a single pass and won't require any intermediate representation as we interpret SECTIONS sub-commands directly.

I'm going to explore this more.

grimar added a comment.May 5 2016, 7:57 AM
In D19190#408584, @ruiu wrote:

I spent half a day on poking around LinkerScript and Writer to figure out how to implement SECTIONS sub-commands nicely, and my conclusion is that the current code needs redesigning.

The current design is structurally hard to handle SECTIONS command because both Writer and LinkerScript have partial view of the sections being constructed. Writer creates output sections in Writer::createSections, and LinkerScript is barely involved with that process. However, if linker scripts are in use, then the constructed output sections are passed to LinkerScript to create symbols, layout sections while evaulating linker script expressions. It is hard to do it right because some information are lost between Writer and LinkerScript. It's not just hard to understand, but it also sets a limitation on what we can do with SECTIONS subcommands. For example, this patch can't handle expressions that appear in middle of an output section description. Adding a support for that would be at least tricky.

The right way I think is to make LinkerScript create output sections, instead of using Writer's createSections, if there is a SECTIONS command, so that LinkerScript controls everything about layout. In the new function, we should order sections *and* assign addresses at the same time. As long as sections sizes are fixed by then (this needs some refactoring as well), it should be doable. It is probably the cleanest and simplest way to handle SECTIONS command, as it is going to be a single pass and won't require any intermediate representation as we interpret SECTIONS sub-commands directly.

I'm going to explore this more.

My vision is here: http://reviews.llvm.org/D19976