This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Convert linker generated sections to input sections
ClosedPublic

Authored by evgeny777 on Oct 14 2016, 10:52 AM.

Details

Summary

ld/gold allow using linker generated sections as inputs to create some other output sections, for example:

SECTIONS {
  .got : { *(.got.plt) *(.got) }
}

This will create single output section .got with contents of linker generated .got and .got.plt section. This patch makes it possible to do the same
This is done by means of using 'proxy' input section, which holds pointer to linker generated section and can be added to OutputSection<ELFT>,
just like any other regular input section can. In future this can also be used to create single output section for mergeable/non-mergeable inputs,
if linker script demands this.

Diff Detail

Repository
rL LLVM

Event Timeline

evgeny777 updated this revision to Diff 74713.Oct 14 2016, 10:52 AM
evgeny777 retitled this revision from to [ELF] Allow linker script to use .got and .got.plt sections as inputs.
evgeny777 updated this object.
evgeny777 added reviewers: ruiu, rafael.
evgeny777 set the repository for this revision to rL LLVM.
evgeny777 added a project: lld.
evgeny777 added subscribers: grimar, ikudrin, llvm-commits.
grimar added inline comments.Oct 17 2016, 12:44 AM
ELF/InputSection.h
342 ↗(On Diff #74713)

Do you need/use this typedefs ?

test/ELF/linkerscript/output-to-input.s
16 ↗(On Diff #74713)

Do you really need to specify order of sections to demonstrate output with/without merging of .got/got.plt ?
I mean shouldn't next work ?

# RUN: echo "SECTIONS { \
# RUN:  .got : {} \
# RUN:  .got.plt : {} \
# RUN: }" > %t1.script
# RUN: echo "SECTIONS { \
# RUN:  .got : { *(.got) *(.got.plt)  } \
# RUN: }" > %t2.script
42 ↗(On Diff #74713)

Last 32 bytes you mean ? (3 + 1) entries * 8.

evgeny777 added inline comments.Oct 17 2016, 2:41 AM
test/ELF/linkerscript/output-to-input.s
16 ↗(On Diff #74713)

You should have equal offsets between .plt and .got.plt section in both cases to get same contents of .got.plt.
Of course it's possible to create smaller examples with the same effect, but in this case result would depend on orphan
section placement algorithm

42 ↗(On Diff #74713)

You're right - that's a mistake.

evgeny777 updated this revision to Diff 74822.Oct 17 2016, 2:57 AM

Addressed review comments

ruiu edited edge metadata.Oct 17 2016, 4:15 PM

In what situation you would want to put both .got and .got.plt to a .got section? I mean, if you want to handle

.got { *(.got) }

we can just ignore such commands.

The reason I'm doing this is that I have same linker script for armv7 and aarch64 and all linkers (gold/ld/lld). This script should enforce two constraints:

  1. PLT GOT entries should go before all other GOT entries
  2. I need to have start and end symbols for GOT

Both constraints need to be enforced for dynamic loader to work properly. Currently I have something like this in my linker script:

.got  : { 
   PROVIDE_HIDDEN(__got_start = .);
   *(.got.plt) *(.got) 
   PROVIDE_HIDDEN(__got_end = .);
}

The problems I face with current version are:

  1. On armv7 neither gold nor ld create .got.plt section, but lld does.
  2. On aarch64 all linkers create .got.plt, but on some occasions lld doesn't create .got section, while ld/gold do. Also, even if I specify correct order in linker script sometimes orphan section may be placed in between .got.plt and .got and prevent DSO from being correctly loaded.

BTW it looks like both gold and ld allow placing all linker generated sections wherever you want. May be it makes sense to make this patch more generic?

ruiu added a comment.Oct 18 2016, 12:00 PM

Honestly I think I wouldn't want to support this unless it is absolutely necessary or there's a better way to support it than the "virtual input section".

First, the SECTIONS command seems odd. The SECTIONS command is to gather input sections to put them into an output section, but there are no .got nor .got.plt sections are in any input file. Therefore, even though

SECTIONS .got { *(.got) }

looks similar to

SECTIONS .text { *(.text) },

they are very different in semantics. The latter aggregates all .text sections from input files to put them into a .text section, but the former instructs the linker to put a linker-created .got section to .got. The former pattern seems semantically wrong and beyond the limit what we could do with the SECTIONS command.

Second, wrapping an output section with a "virtual" input section seems pretty odd, too. That's a direct consequence of the oddity of the feature, but output sections are not input sections, so it's really confusing.

evgeny777 updated this revision to Diff 75575.Oct 24 2016, 7:23 AM
evgeny777 edited edge metadata.
evgeny777 removed rL LLVM as the repository for this revision.

Converted .got and .got.plt sections to inputs according to RFC from Rui Ueyama. All tests pass.

ruiu added a comment.Oct 24 2016, 6:47 PM

I think this is still too large to see whether the very concept of virtual input section is good or not. GOT section is used everywhere, so you had to update so many places to convert it. Could you convert much less important section first? Say, build-id?

ELF/InputSection.h
351 ↗(On Diff #75575)

Does Header = {}; initialize with zero?

evgeny777 added inline comments.Oct 25 2016, 6:31 AM
ELF/InputSection.h
351 ↗(On Diff #75575)

Yes, but unfortunately base class is initialized first and it is using Header.

evgeny777 updated this revision to Diff 75694.Oct 25 2016, 6:33 AM
evgeny777 retitled this revision from [ELF] Allow linker script to use .got and .got.plt sections as inputs to [ELF] Convert linker generated sections to input sections.

Converted BuildId section to input.

Apologies in advance if I'm missing something, but I'm not completely sure why we need a special SyntheticInputSection?
Is it because we want to retain a similar custom interface to make generating the Section easier, while still allowing it to be placed as an InputSection? Or is it to get around the lack of an InputFile for the SyntheticInputSection? If it is the latter I would have thought a single SyntheticInputFile might be less disruptive.

Another possible use case of linker generated InputSections is Thunks (or other linker generated code sequences/data) as InputSections, which would need to be mixed in with other InputSections from the file. Would this be possible with SyntheticInputSections?

As an aside for ARM PLT and GOT sections:

I've not got a good reason for why ARM only uses a single .got section on ld.bfd and gold. A best guess is that ARM also had to support Symbian OS at the time which only had a single .got section. At the time of doing the initial ARM port I thought about making a single .got section to match gold and ld.bfd, but given the tests I've been running against ld-linux-armhf.so.3 haven't caused a problem I didn't think it worth the additional disruption to the code-base. Is there a widespread use case where this doesn't work?

Thanks for your comment, Peter.

a) SyntheticInputSection is actually a helper class to simplify conversion from output section to input section. It doesn't have any other purpose, just helper constructor and base interface which is partially moved from OutputSectionBase<ELFT>.

b) AFAIK thunks in lld are created in a different way: no input section is created for them, instead there is an array of Thunk<ELFT> objects which is member of InputSection<ELFT> class. This array is used to determine real section size. Real thunk code is generated when sections are written.

c) Like I said earlier both ld and gold don't generate .got section on 32 bit ARM platforms. I don't know if this would ever change. On the other hand lld sometimes doesn't create .got section when both ld and gold do. To abstract differences between 32 and 64 bit ARM we merge .got and .got.plt to a single section in linker scripts. Both gold and ld allow doing this. Another possible usage of this patch is discarding certain linker generated section in linker script, which can't be done now.

ruiu added a comment.Oct 26 2016, 5:14 PM

Overall, this is towards the right direction. Please rebase on SVN head because Rafael made a few changes that you can use in this patch.

ELF/Writer.cpp
780 ↗(On Diff #75694)

Remove null check. Sections shouldn't contain a nullptr.

869 ↗(On Diff #75694)

Why do you have to call S->Outsec->assignOffsets?

1495 ↗(On Diff #75694)

You want to use needed() for consistency.

evgeny777 added inline comments.Oct 28 2016, 4:35 AM
ELF/Writer.cpp
780 ↗(On Diff #75694)

Actually they can. In original version if build-id section is not emitted then Out<ELFT>::BuildId is nullptr. isDiscarded() does check for nullptr as well

869 ↗(On Diff #75694)

In order for OutSec to have non-zero size. Normally assignOffsets is called earlier in createSections().

1495 ↗(On Diff #75694)

I've removed needed() in upcoming diff, because it isn't required for BuildIdSection.

evgeny777 updated this revision to Diff 76177.Oct 28 2016, 4:36 AM

Addressed review comments

ruiu added inline comments.Oct 28 2016, 2:48 PM
ELF/InputSection.cpp
863 ↗(On Diff #76177)

Here, you fill the internal buffer with data, which to be memcpy'ed to the final output buffer. I think we don't want to do that. Instead, we want to virtualize InputFile's writeTo so that each input section will directly write to the output buffer.

ELF/InputSection.h
254–255 ↗(On Diff #76177)

This field is very specific to build-id, which needs to back-fill the buildid value after we get a complete output file. No other sections should need them, and that's a good thing because it keeps most sections being agnostic where they are written. So I'd move this to BuildIdSection, as it used to be.

evgeny777 added inline comments.Oct 31 2016, 6:05 AM
ELF/InputSection.cpp
863 ↗(On Diff #76177)

Besides writeTo one will also have to virtualize getSize(), in order for assignOffsets to work properly

evgeny777 added inline comments.Oct 31 2016, 7:04 AM
ELF/InputSection.cpp
863 ↗(On Diff #76177)

Also if you virtualize getSize() then you'll get different values from Data.size() and getSize(). The getDataAs<T> won't work either. Does extra copying really provide that much overhead ?

ruiu accepted this revision.Oct 31 2016, 10:36 AM
ruiu edited edge metadata.

LGTM

ELF/InputSection.cpp
863 ↗(On Diff #76177)

I don't know about that. We might want to create a large virtual section, but I don't have a concrete example now. Because this patch is pretty straightforward and simple, let's land this. We can revisit later.

This revision is now accepted and ready to land.Oct 31 2016, 10:36 AM
evgeny777 added inline comments.Oct 31 2016, 10:38 AM
ELF/InputSection.cpp
863 ↗(On Diff #76177)

Ok. Does it make sense to convert other sections (like .got and .got.plt) now ?

ruiu added a comment.Oct 31 2016, 10:40 AM

Yes, but please do one at a time at the moment.

This revision was automatically updated to reflect the committed changes.