This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ELF][AArch64] Add support for AArch64 range extension thunks.
ClosedPublic

Authored by peter.smith on Nov 7 2017, 10:15 AM.

Details

Summary

The AArch64 unconditional branch and branch and link instructions have a maximum range of +/- 128 Mib. This is usually enough for most programs but it is possible for very large programs or those with a linker script to exceed this range. This change adds support for range-extension thunks to AArch64.

Implementation Notes:

  • I've used Thunks that can access the whole address range. There are more efficient Thunks that can be used, for example the PI thunk could use the usual ADRP addressing mode, but this would limit the range to +/- 4Gb. In an ideal world we could generate the thunks with a limited range only when we know they are in range, but this will require some changes and additional complexity in the underlying framework so I chose to keep it simple.
  • The test aarch64-thunk-section-location.s is there to check that Thunks are placed roughly 128 MiB into a large Output Section, this test takes a noticeable amount of time to assemble and link, it may not be worth adding the test.
  • I've not got any really large > 128 MiB AArch64 programs to test against. I've checked a small test case (absolute and with --pie) that artificially adds in large amounts of .space to check that the thunks are created and execute at runtime.

References:

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

peter.smith created this revision.Nov 7 2017, 10:15 AM
peter.smith edited the summary of this revision. (Show Details)Nov 8 2017, 9:50 AM

Ping for review; this patch just adds support for AArch64 range extension thunks to the existing framework, no extra complexity has been added to the Relocations implementation.

Adding in comment from [llvm-commits]

  • I've used Thunks that can access the whole address range. There are more efficient Thunks that can be used, for example the PI thunk could use the usual ADRP addressing mode, but this would limit the range to +/- 4Gb. In an ideal world we could generate the thunks with a limited >range only when we know they are in range, but this will require some changes and additional complexity in the underlying framework so I chose to keep it simple.

Do you know what other linkers do? Do they start with a adrp and upgrade
the thunk if that is not sufficient?

In gold and bfd, an adrp is used if the if the destination is within 4Gb, if it drifts out of range a longer range "stub" very similar to the ones I'm proposing here are used.

On x86_64 the compiler is required to not use the small code model to support more than 4gb. Is there something like it in aarch64?

Yes aarch64 has a small and large code model, the small code-model only supports applications up to 4Gb. AFAIK the large code model is implemented in clang and gcc for non position independent code, but is not yet implemented for position independent code. So at this stage you could not realistically build a position independent example > 4Gb. It is possible to build an executable that does though.

Index: test/ELF/aarch64-thunk-section-location.s

  • /dev/null

+++ test/ELF/aarch64-thunk-section-location.s
@@ -0,0 +1,41 @@
+ RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu %s -o %t
+
RUN: ld.lld %t -o %t2 2>&1
+ RUN: llvm-objdump -d -start-address=134086664 -stop-address=134086676 -triple=aarch64-linux-gnu %t2 | FileCheck %s
+
+
Check that the range extension thunks are dumped close to the aarch64 branch
+// range of 128 MiB
+ .section .text.1, "ax", %progbits
+ .balign 0x1000
+ .globl _start
+_start:
+ bl high_target
+ ret
+
+ .section .text.2, "ax", %progbits
+ .space 0x2000000
+
+ .section .text.2, "ax", %progbits
+ .space 0x2000000
+
+ .section .text.3, "ax", %progbits
+ .space 0x2000000
+
+ .section .text.4, "ax", %progbits
+ .space 0x2000000 - 0x40000
+
+ .section .text.5, "ax", %progbits
+ .space 0x40000

Why do you need multiple sections instead of a single .space with the
total amount? To show that the thunk could have been placed earlier but
was not?

Yes, exactly that, it isn't a particularly important test case, it is just a check to make sure we try and place the thunk to be in range of the most number of callers when it can.

It is surprising that we expected support more than 4GB of code, but if
that is an aarch64 requirement, LGTM.

Given that it isn't possible to build a position independent executable > 4Gb I think it would be safe to do:

  • Use the load-literal for all non-position independent thunks
  • Use an ADRP for position independent thunks

Does that sound preferable?

Ideally I think I'd prefer to use ADRP for non-position independent thunks as it can go into execute only memory. However I think supporting multiple ranges of thunks is a separate patch.

Peter

Updated diff to use ADRP for position independent code. This thunk has a maximum range of +- 4 Gigabytes which is not the whole address space. However the default small code model only supports programs up to 4 Gigabytes in size and the large code model is not currently implemented in gcc and clang for position independent code so it is safe to use for position independent thunks.

I'll commit tomorrow if there are no further comments.

This revision was automatically updated to reflect the committed changes.
test/ELF/aarch64-call26-error.s