This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - linkerscript AT keyword (in output section description) implemented.
ClosedPublic

Authored by grimar on Apr 19 2016, 9:54 AM.

Details

Summary

The linker will normally set the LMA equal to the VMA.
You can change that by using the AT keyword.
The expression lma that follows the AT keyword specifies
the load address of the section.

Patch implements this keyword.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar updated this revision to Diff 54212.Apr 19 2016, 9:54 AM
grimar retitled this revision from to [ELF] - linkerscript AT keyword (in output section description) implemented..
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar added subscribers: llvm-commits, grimar.
emaste added a subscriber: emaste.Apr 19 2016, 10:19 AM
ruiu added inline comments.Apr 19 2016, 10:47 AM
ELF/LinkerScript.cpp
170 ↗(On Diff #54212)

Ah, so PA stands for Physical Address, but is it a common name? At least it doesn't feel "physical" to me. Why don't you name this getAddr?

Also I'd rename Name -> SectionName.

232 ↗(On Diff #54212)

readAt for consistency.

434 ↗(On Diff #54212)

Use setError.

442–444 ↗(On Diff #54212)

No need to reuse a variable.

if (peek() == "AT")
 readAt(OutSec);
ELF/LinkerScript.h
71 ↗(On Diff #54212)

What does PA stand for?

91 ↗(On Diff #54212)

StringMap copies keys. Do you want to use DenseMap instead?

ELF/Writer.cpp
1464–1466 ↗(On Diff #54212)

Why we want to create a new load segment for sections with AT commands?

grimar updated this revision to Diff 54337.Apr 20 2016, 3:17 AM
  • Addressed review commens.
ELF/LinkerScript.cpp
170 ↗(On Diff #54212)

I took "physical address" term from description of p_paddr field:
https://docs.oracle.com/cd/E19683-01/816-1386/chapter6-83432/index.html
I did not name getAddr() because it seems confusing to me, when I hear "address" I usually think about VMA, since LMA is not
really common.
But I think we can call it getLma(), as LMA is a term used for AT expressions:
https://www.sourceware.org/binutils/docs-2.12/ld.info/Output-Section-LMA.html

232 ↗(On Diff #54212)

Done.

434 ↗(On Diff #54212)

Done.

442–444 ↗(On Diff #54212)

Done.

ELF/LinkerScript.h
91 ↗(On Diff #54212)

Yes, I know it do. I see no problem here. StringMap can be a bit faster for lookup, but this place should not be
bottleneck though, so I think there are no reasons to use it or not to do that here,
because amount of keys should be little either.
I used it mostly to be consistent with Filler field. May be we should combine them into one variable ?

struct OutputSectionData {
  std::vector<uint8_t> Filler;
  std::vector<StringRef> PhysAddrExpr;
};

llvm::StringMap<OutputSectionData> OutSections;
ELF/Writer.cpp
1464–1466 ↗(On Diff #54212)

Section header has only one address field sh_addr which is VMA:
https://docs.oracle.com/cd/E19455-01/806-3773/elf-2/index.html
So there is not way to set LMA for section, only for load segment itself I believe.
And that means that for section that explicitly requested LMA new load has
to be created.

Both gold and bfd do the same.

AT commands changes p_paddr of PT_LOAD header in binutils:
https://sourceware.org/ml/binutils/2011-06/msg00262.html

ruiu edited edge metadata.Apr 20 2016, 3:15 PM

I wonder if you are implementing this feature for some specific product. For example, are you trying to link some kernel or something? Or are you doing this for the sake of completeness? It seems to me that AT command is a tricky option that you want to use only when you are dealing with embedded systems or something.

ELF/LinkerScript.cpp
250 ↗(On Diff #54337)

Can you name this getLoadAddr and add comments what the load address is?

ELF/Writer.cpp
1467–1470 ↗(On Diff #54337)

Then please explain why we need to do this instead of what we are doing here in the comment.

// Segments are contiguous memory regions that has the same attributes
// (e.g. executable or writable). There is one phdr for each segment. Therefore,
// we need to create a new phdr when the next section has different flags or
// is loaded at a discontiguous address using AT linker script command.
In D19272#407140, @ruiu wrote:

I wonder if you are implementing this feature for some specific product. For example, are you trying to link some kernel or something? Or are you doing this for the sake of completeness? It seems to me that AT command is a tricky option that you want to use only when you are dealing with embedded systems or something.

I am using freebsd script as reference (https://svnweb.freebsd.org/base/head/sys/conf/ldscript.amd64?view=markup), assuming that it contains some
subset of features that is reasonable to implement. It uses AT. It definitely looks tricky feature, but it was easy to implement, so I did it because
assumed that supporting it is better than ignoring or something.
Though it would be interesting to know how much is it important to have and can it be replaced with something/ignored or not (I don't know answer on this question).

grimar updated this revision to Diff 54482.Apr 21 2016, 4:11 AM
grimar edited edge metadata.
grimar marked 2 inline comments as done.
  • Addressed review comments.
ELF/LinkerScript.cpp
250 ↗(On Diff #54337)

Done.

ELF/Writer.cpp
1467–1470 ↗(On Diff #54337)

Done.

Note that AT was introduced to the FreeBSD kernel linker script only recently, https://svnweb.freebsd.org/base/head/sys/conf/ldscript.amd64?r1=284869&r2=284870&

For testing I'll revert that commit but we will need it to be supported.

grimar updated this revision to Diff 65049.Jul 22 2016, 3:47 AM
  • Rebased.
evgeny777 added inline comments.Jul 22 2016, 4:20 AM
ELF/LinkerScript.cpp
849 ↗(On Diff #65049)

Can you reuse readSectionsCommandExpr() ?

ELF/Writer.cpp
1163 ↗(On Diff #65049)

If your patch is assigning LMA for *sections* then it should be done in LinkerScript::assignAddresses, I think

test/ELF/linkerscript-at.s
66 ↗(On Diff #65049)

I have feeling that section alignment is not done for LMA

grimar added inline comments.Jul 22 2016, 4:24 AM
ELF/Writer.cpp
1163 ↗(On Diff #65049)

Section header has only one address field sh_addr which is VMA:
https://docs.oracle.com/cd/E19455-01/806-3773/elf-2/index.html
So there is not way to set LMA for section, only for load segment itself I believe.
And that means that for section that explicitly requested LMA new load has
to be created.

Both gold and bfd do the same.

evgeny777 added inline comments.Jul 22 2016, 4:37 AM
ELF/Writer.cpp
1163 ↗(On Diff #65049)

Well, OutputSectionBase contains not only header, but PageAlign, SectionIndex and Name.
May be adding some extra field like PhysicalAddress to OutputSectionBase is not that bad?

grimar added inline comments.Jul 22 2016, 6:40 AM
test/ELF/linkerscript-at.s
66 ↗(On Diff #65049)

I guess you meant VMA.

As far I understand the feature, AT just sets the physical address field of PT_LOAD. VMA is different and should be controlled
by location counter or something else. Setting PA gives ability to control LMA for sections.
I think the output after applying my patch is equal to what gnu ld do.
gold also touches the file offsets of the sections, gnu ld does not, this patch also does not.

Would be interesting to hear opinion from reviewers

grimar added a subscriber: davide.Jul 25 2016, 11:21 PM
grimar updated this revision to Diff 65730.Jul 27 2016, 6:51 AM
  • Rebased.

FYI the linker script parser now handles all of the FreeBSD kernel linker script except for AT. Now the only kernel linker script workaround in my test FreeBSD tree is:

diff --git a/sys/conf/ldscript.amd64 b/sys/conf/ldscript.amd64
index d445857..3efca9a 100644
--- a/sys/conf/ldscript.amd64
+++ b/sys/conf/ldscript.amd64
@@ -8,12 +8,7 @@ SECTIONS
   /* Read-only sections, merged into text segment: */
   kernphys = CONSTANT (MAXPAGESIZE);
   . = kernbase + kernphys + SIZEOF_HEADERS;
-  /*
-   * Use the AT keyword in order to set the right LMA that contains
-   * the physical address where the section should be loaded. This is
-   * needed for the Xen loader which honours the LMA.
-   */
-  .interp         : AT (kernphys + SIZEOF_HEADERS) { *(.interp) }
+  .interp         : { *(.interp) }
   .hash           : { *(.hash) }
   .gnu.hash       : { *(.gnu.hash) }
   .dynsym         : { *(.dynsym) }

FYI the linker script parser now handles all of the FreeBSD kernel linker script except for AT. Now the only kernel linker script workaround in my test FreeBSD tree is:

diff --git a/sys/conf/ldscript.amd64 b/sys/conf/ldscript.amd64
index d445857..3efca9a 100644
--- a/sys/conf/ldscript.amd64
+++ b/sys/conf/ldscript.amd64
@@ -8,12 +8,7 @@ SECTIONS
   /* Read-only sections, merged into text segment: */
   kernphys = CONSTANT (MAXPAGESIZE);
   . = kernbase + kernphys + SIZEOF_HEADERS;
-  /*
-   * Use the AT keyword in order to set the right LMA that contains
-   * the physical address where the section should be loaded. This is
-   * needed for the Xen loader which honours the LMA.
-   */
-  .interp         : AT (kernphys + SIZEOF_HEADERS) { *(.interp) }
+  .interp         : { *(.interp) }
   .hash           : { *(.hash) }
   .gnu.hash       : { *(.gnu.hash) }
   .dynsym         : { *(.dynsym) }

Yes, I know. I am testing it with the same "change". I think 10.3 release sources head does not contain AT and I using them.

I am not sure how to test properly that AT implementation is correct ? Since it is not important to have for FreeBSD.

FYI the linker script parser now handles all of the FreeBSD kernel linker script except for AT. Now the only kernel linker script workaround in my test FreeBSD tree is:

diff --git a/sys/conf/ldscript.amd64 b/sys/conf/ldscript.amd64
index d445857..3efca9a 100644
--- a/sys/conf/ldscript.amd64
+++ b/sys/conf/ldscript.amd64
@@ -8,12 +8,7 @@ SECTIONS
   /* Read-only sections, merged into text segment: */
   kernphys = CONSTANT (MAXPAGESIZE);
   . = kernbase + kernphys + SIZEOF_HEADERS;
-  /*
-   * Use the AT keyword in order to set the right LMA that contains
-   * the physical address where the section should be loaded. This is
-   * needed for the Xen loader which honours the LMA.
-   */
-  .interp         : AT (kernphys + SIZEOF_HEADERS) { *(.interp) }
+  .interp         : { *(.interp) }
   .hash           : { *(.hash) }
   .gnu.hash       : { *(.gnu.hash) }
   .dynsym         : { *(.dynsym) }

Yes, I know. I am testing it with the same "change". I think 10.3 release sources head does not contain AT and I using them.

I am not sure how to test properly that AT implementation is correct ? Since it is not important to have for FreeBSD.

As far as I can tell the Xen loader in FreeBSD loader uses it. Eventually we might want to try (or ask somebody to try).
I think that getting X86-64 (and maybe other tier1 or tier2-almost-1 arches) to run is an higher priority (at least it is for me:))

FYI the linker script parser now handles all of the FreeBSD kernel linker script except for AT. Now the only kernel linker script workaround in my test FreeBSD tree is:

diff --git a/sys/conf/ldscript.amd64 b/sys/conf/ldscript.amd64
index d445857..3efca9a 100644
--- a/sys/conf/ldscript.amd64
+++ b/sys/conf/ldscript.amd64
@@ -8,12 +8,7 @@ SECTIONS
   /* Read-only sections, merged into text segment: */
   kernphys = CONSTANT (MAXPAGESIZE);
   . = kernbase + kernphys + SIZEOF_HEADERS;
-  /*
-   * Use the AT keyword in order to set the right LMA that contains
-   * the physical address where the section should be loaded. This is
-   * needed for the Xen loader which honours the LMA.
-   */
-  .interp         : AT (kernphys + SIZEOF_HEADERS) { *(.interp) }
+  .interp         : { *(.interp) }
   .hash           : { *(.hash) }
   .gnu.hash       : { *(.gnu.hash) }
   .dynsym         : { *(.dynsym) }

Yes, I know. I am testing it with the same "change". I think 10.3 release sources head does not contain AT and I using them.

I am not sure how to test properly that AT implementation is correct ? Since it is not important to have for FreeBSD.

As far as I can tell the Xen loader in FreeBSD loader uses it. Eventually we might want to try (or ask somebody to try).
I think that getting X86-64 (and maybe other tier1 or tier2-almost-1 arches) to run is an higher priority (at least it is for me:))

But, to clarify, it's important :) (or at least there are some people using Xen which care about it =))

phosek added a subscriber: phosek.Aug 12 2016, 5:22 PM

We'd like to use LLD for linking LK kernel and the AT keyword support is currently the only thing that's stopping us, so landing this change would really help us as well.

In D19272#407140, @ruiu wrote:

I wonder if you are implementing this feature for some specific product. For example, are you trying to link some kernel or something? Or are you doing this for the sake of completeness? It seems to me that AT command is a tricky option that you want to use only when you are dealing with embedded systems or something.

IMHO that's the case not only for AT command but for linker scripts in general, vast majority of linker script use cases would be kernels and embedded systems.

We'd like to use LLD for linking LK kernel and the AT keyword support is currently the only thing that's stopping us, so landing this change would really help us as well.

Did you try to link with this patch applied ? I am curios whether this patch is enough to support AT.

ruiu added a comment.Aug 15 2016, 4:28 PM

This change needs a test.

ELF/LinkerScript.cpp
334 ↗(On Diff #65730)

Small but noticeable duplicate code. Please define hasLma using getLma.

return getLma(Name) != -1;
ELF/Writer.cpp
993 ↗(On Diff #65730)

I don't think this will create a valid PHDR if there's a section with an AT command followed by another section without AT.

.foo 0x1000 : AT(0) { ... }
.bar 0x2000 { ... }

For example, if .bar's flags are the same as .foo's, then this code would put both .foo and .bar into the same segment.

1643 ↗(On Diff #54482)

Remove this comment. I think it is too terse and obvious.

grimar updated this revision to Diff 68144.Aug 16 2016, 12:16 AM
  • Restored lost testcase.
  • Rebased, addressed review comments.
ELF/LinkerScript.cpp
334 ↗(On Diff #65730)

Not sure I could do that. getLma had VA argument to use as a Dot and returned
the result of expression. It would be not correct to compare expression result with -1,
also it would introduce excessive computation of expression (first for hasLma() and then for getLma()).

I reimplemented this.

ELF/Writer.cpp
993 ↗(On Diff #65730)

It is expected behavior. AT starts new segment but other sections can be put inside it as well.
It just like ADDR() for LMA.

Using next script ld will create 2 loads:

SECTIONS { 
  . = 0x1000; 
  .aaa : AT(0x2000) 
  { 
   *(.aaa) 
  } 
  .bbb : 
  { 
   *(.bbb) 
  } 
  .ccc : AT(0x3000) 
  { 
   *(.ccc) 
  } 
  .text : { *(.text*) }
}

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000001000
                 0x0000000000001010 0x0000000000001010  R      200000
  LOAD           0x0000000000001010 0x0000000000001010 0x0000000000003000
                 0x0000000000000009 0x0000000000000009  R E    200000

 Section to Segment mapping:
  Segment Sections...
   00     .aaa .bbb 
   01     .ccc .text
1158 ↗(On Diff #65730)

Done.

ruiu accepted this revision.Aug 16 2016, 2:04 PM
ruiu edited edge metadata.

LGTM

ELF/Writer.cpp
1137–1138 ↗(On Diff #68144)
if (Expr LmaExpr = Script<ELFT>::X->getLma(First->getName())
1139 ↗(On Diff #68144)

I have a concern that the stuff for the linker script is leaking here. Expr is for the linker script, and I wanted to keep Writer being agnostic about it. From that point of view, this is not very clean. However, being said that, I have no good alternative suggestion, so I'm OK with this.

This revision is now accepted and ready to land.Aug 16 2016, 2:04 PM
This revision was automatically updated to reflect the committed changes.