The PLT sequences that Arm currently uses are what gold and ld.bfd would use when the --long-plt option is used. These long sequences are relatively easy to understand and don't have any restrictions on the displacement between the .plt and the .plt.got. If the maximum displacement between the .plt and the .plt.got is under 128 Megabytes, which is true for the vast majority of Arm executables and shared libraries, we can use a shorter PLT sequence that avoids a load.
This shorter instruction sequence is given in appendix A of ELF for the Arm Architecture (http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044f/IHI0044F_aaelf.pdf). To reproduce the text here:
ADD ip, pc, #-8:PC_OFFSET_27_20: __PLTGOT(X) ; R_ARM_ALU_PC_G0_NC(__PLTGOT(X)) ADD ip, ip, #-4:PC_OFFSET_19_12: __PLTGOT(X) ; R_ARM_ALU_PC_G1_NC(__PLTGOT(X)) LDR pc, [ip, #0:PC_OFFSET_11_0: __PLTGOT(X)]! ; R_ARM_LDR_PC_G2(__PLTGOT(X))
The ADD instructions use the "modified immediate" encoding; an 8-bit immediate is rotated right by an even number of bits. For example 0xNN can be rotated to bits 27:20 with a rotate right of 12, or bits 19:12 by a rotate right of 20. The example in the ABI document uses the "Group Relocations"; these attempt to find the best sequence of modified immediates to represent an arbitrary constant. As the implementation of the group relocations is complex, I've chosen to follow gold and bfd's lead and hard code the rotations so that we can only represent constants of the form 0x0NNNNNNN by:
ADD ip, pc, #0x0NN00000 ADD ip, ip, #0x000NN000 LDR pc, [ip, #0x00000NNN]!
Other design decisions:
- I've chosen to padd out the entry to 16-bytes so that the header and the entries are all 16-byte aligned.
- The previous PLT entries are now available under the --long-plt option.
- If the offset from the .plt to the .got.plt cannot be encoded we give an error suggesting the use of the --long-plt option.
- I've updated the existing tests to use --long-plt instead of updating all the tests. I'm quite happy to do this if people prefer.
An alternative implementation using the group relocations is possible. In some cases (when the offset between .plt and .got.plt has zeroes) we could get a longer range. This does make the implementation significantly more complex though. I'm happy to do this as a follow up if there is demand.