[LLD][ELF] Pre-create ThunkSections at Target specific intervals
Needs ReviewPublic

Authored by peter.smith on Tue, Jun 27, 7:54 AM.

Details

Reviewers
ruiu
rafael
Summary

When an OutputSection is larger than the branch range for a Target we need to place thunks such that they are always in range of their caller, and sufficiently spaced to maximise the number of callers that can use the thunk. We use the simple heuristic of placing the ThunkSection at intervals corresponding to a target specific branch range. If the OutputSection is small we put the thunks at the end of the executable sections.

The overall design for range thunks is for each InputSectionDescription pre-create ThunkSections at regularly spaced intervals in which we can place range extension thunks that are within range of their caller. This is similar to the stub-groups implemented by bfd and clang.

This is patch 6/11 of range thunks. It is dependent on D34688 and its dependencies.

Diff Detail

peter.smith created this revision.Tue, Jun 27, 7:54 AM

Rebased, no other changes.

grimar added a subscriber: grimar.Fri, Jul 7, 12:52 AM
grimar added inline comments.
ELF/Relocations.cpp
979

You can use llvm::remove_if.

1016

It is equal to next, right ?

if (!ThunkSections[ISR].empty())
  return ThunkSections[ISR].front();

Updated to incorporate George's comments. I've applied both suggestions.

ruiu added inline comments.Mon, Jul 10, 3:36 PM
ELF/Arch/ARM.cpp
66

Can you extend the comment so that it is more friendly to those who don't know much about ARM? e.g. Thumb unconditional branch instructions have xx bit displacement, so it can jump to +-yy bytes. We insert thunks for at least yy bytes to support longer jumps.

68

Why do you want to limit the size of thunk size?

ELF/Relocations.cpp
1011–1025

What is ISR? Adding a function comment would help.

peter.smith marked 3 inline comments as done.Tue, Jul 11, 4:52 AM

Thanks for the comments, I'll update the diff.

What I'm aiming for is:

Section 0.0
...
Precreated RangeThunk
; (dot) - Section 0.0->OutSecOff < ThunkSectionSpacing)
Section 1.0
...
Precreated RangeThunk
; (dot) - Section 1.0->OutSecOff < ThunkSectionSpacing)

This is a simple way of making sure that as many branches within the gaps between ThunkSections can reach their targets without needing additional ThunkSections creating on later passes.

ELF/Arch/ARM.cpp
66

I've added to the comment.

68

What I'm trying to achieve is something like:

Range of sections:
start:
Section 0 at OutSecOff 0
...
RangeThunk at offset OFF from start. Where (RangeThunk.OutSecOff + RangeThunk.Size) - start < ThunkSectionSpacing

The ThunkSectionSize is a pessimistic guess at the final RangeThunk.Size so that we can guarantee that a branch at offset 0 can reach anywhere within RangeThunk.

ELF/Relocations.cpp
1011–1025

I've added a comment. ISR is meant to stand for Input Section Range, which is supposed to represent a range of InputSections (specifically those that come from a single InputSectionDescription), I used range rather than description as at one point we had a distinction between the script and non-script cases.

I'm open to suggestions for better names, or I can expand them to make them a bit more self-descriptive?

peter.smith marked 3 inline comments as done.

Updated diffs to add additional comments. No change to code.

ruiu added inline comments.Tue, Jul 11, 3:35 PM
ELF/Arch/ARM.cpp
66

By "b", I believe you actually meant "B" (i.e. not bit but byte). Probably the best way of representing it is 16 MiB.

ELF/Relocations.cpp
982

Is this the same as Thunks.empty()?

1070–1072

It seems IS is always a member of ISR, but it is not obvious why forEachExecInputSection is designed like that. Maybe we should change the type of the function from

forEachExecInputSection(ArrayRef<OutputSectionCommand *> OutputSections,
    std::function<void(OutputSection *, std::vector<InputSection *> *,
                       InputSection *)> Fn)

to

forEachExecInputSection(ArrayRef<OutputSectionCommand *> OutputSections,
    std::function<void(OutputSection *, std::vector<InputSection *> *)> Fn)

and run a for-loop inside the callback function.

1134–1135

Unnecessary style change.

ELF/Target.h
78

nit: just 0 as GotBaseSymOff is set to just 0.

peter.smith marked 4 inline comments as done.Wed, Jul 12, 6:53 AM

Thanks very much for the review comments, I'll update the diff shortly.

ELF/Relocations.cpp
982

Not quite as remove_if won't reduce the size of the vector unless Thunks.erase is called. Given that I managed to forget to consistently use ThunkBegin and ThunkEnd after various review cycles makes me think it is best to call Thunks.erase(). I've made that update as it means that ThunkBegin and ThunkEnd can be removed.

1070–1072

I've made the change. The reason I originally had it in forEachExecInputSection() was to reduce the level of indentation needed in the callbacks.

1134–1135

I've run clang-format on the file, hopefully there aren't too many just white-space differences.

ELF/Target.h
78

Done

peter.smith marked 3 inline comments as done.

Updated diff to address review comments.

ikudrin added inline comments.
ELF/Relocations.cpp
1086

As now you create Thunk sections in advance, it is possible that they stay empty when ThunkCreator::mergeThunks() is called. In that case, it might call getTargetInputSection() on an empty ThunkSections, which, on its turn, calls front() on an empty vector Thunks.

Thanks for the comment, I've put a response inline and hope to update the diff later today.

ELF/Relocations.cpp
1086

I think that with the most recent update I've changed the bit of code in mergeThunks to erase ThunkSections with 0 Thunks. Having said that given that getTargetInputSection() is supposed to be able to return nullptr when there is no target I think it makes sense to return nullptr in ThunkSection::getTargetInputSection() if there are no Thunks.

I'll aim to update the diff later today.

ikudrin added inline comments.Thu, Jul 13, 8:01 AM
ELF/Relocations.cpp
1086

Oh, you are right. I didn't notice the change. I was playing with a patch from D34634 which was a bit outdated.

Thank you for the work!

Updated diff to return nullptr from ThunkSection::getTargetInputSection() when there are no Thunks.

It is possible that a precreated thunk section stays empty in the first pass, but is used in the next pass. In that case, it isn't added into ISR in ThunkCreator::mergeThunks(). For example:

$ cat a.s
	.global _start, foo
.section .text.start,"ax",%progbits
_start:
	bl _start
.section .text.dummy1,"ax",%progbits
.space 0xfffffe
.section .text.foo,"ax",%progbits
foo:
	bl _start
$ llvm-mc -filetype=obj -triple=thumbv7a-none-linux-gnueabi a.s -o a.o
$ ld.lld a.o -o a.out
$ objdump -d a.out
a.out:     file format elf32-littlearm


Disassembly of section .text:

00011000 <_start>:
   11000:	f7ff effe 	blx	11000 <_start>

00011004 <__Thumbv7ABSLongThunk__start>:
   11004:	f241 0c00 	movw	ip, #4096	; 0x1000
   11008:	f2c0 0c01 	movt	ip, #1
   1100c:	4760      	bx	ip
	...

01011002 <__Thumbv7ABSLongThunk__start>:
	...

0101100c <foo>:
 101100c:	f7ff fff9 	bl	1011002 <__Thumbv7ABSLongThunk__start>

By the way, this sample shows one additional issue: redundant thunks are not removed.

It is possible that a precreated thunk section stays empty in the first pass, but is used in the next pass. In that case, it isn't added into ISR in ThunkCreator::mergeThunks(). For example:

$ cat a.s
	.global _start, foo
.section .text.start,"ax",%progbits
_start:
	bl _start
.section .text.dummy1,"ax",%progbits
.space 0xfffffe
.section .text.foo,"ax",%progbits
foo:
	bl _start
$ llvm-mc -filetype=obj -triple=thumbv7a-none-linux-gnueabi a.s -o a.o
$ ld.lld a.o -o a.out
$ objdump -d a.out
a.out:     file format elf32-littlearm


Disassembly of section .text:

00011000 <_start>:
   11000:	f7ff effe 	blx	11000 <_start>

00011004 <__Thumbv7ABSLongThunk__start>:
   11004:	f241 0c00 	movw	ip, #4096	; 0x1000
   11008:	f2c0 0c01 	movt	ip, #1
   1100c:	4760      	bx	ip
	...

01011002 <__Thumbv7ABSLongThunk__start>:
	...

0101100c <foo>:
 101100c:	f7ff fff9 	bl	1011002 <__Thumbv7ABSLongThunk__start>

By the way, this sample shows one additional issue: redundant thunks are not removed.

I think you are correct, and I'll need to address the problem of ThunkSections empty on the first pass but not used. I think D34692 would be the better place to do so as this is where multiple places are introduced. Would it be possible to make the comment in D34692, or I can copy it across?

I deliberately didn't try and remove Thunks that are redundant, I'm expecting redundant thunks to only happen when there is a Thunk that has been knocked out of range of all of its callers and I think this will be rare enough to not warrant the extra complexity.

Rebased to account for r308056 and r308057. I have also accounted for Rafael's observation in createInitialThunkSections that the spacing should subtract Target->ThunkSectionSize. I'll account for Igor's comment in D34692

Rebased and run through clang-format to account for "r308297 - Apply clang-format. NFC.". No other changes to patch.