This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ELF] Optimize Arm PLT entries
ClosedPublic

Authored by peter.smith on Dec 14 2017, 9:32 AM.

Details

Summary

The PLT sequences that Arm currently uses are what gold and ld.bfd would use when the --long-plt option is used. These long sequences are relatively easy to understand and don't have any restrictions on the displacement between the .plt and the .plt.got. If the maximum displacement between the .plt and the .plt.got is under 128 Megabytes, which is true for the vast majority of Arm executables and shared libraries, we can use a shorter PLT sequence that avoids a load.

This shorter instruction sequence is given in appendix A of ELF for the Arm Architecture (http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044f/IHI0044F_aaelf.pdf). To reproduce the text here:

ADD ip, pc, #-8:PC_OFFSET_27_20: __PLTGOT(X) ; R_ARM_ALU_PC_G0_NC(__PLTGOT(X))
ADD ip, ip, #-4:PC_OFFSET_19_12: __PLTGOT(X) ; R_ARM_ALU_PC_G1_NC(__PLTGOT(X))
LDR pc, [ip, #0:PC_OFFSET_11_0: __PLTGOT(X)]! ; R_ARM_LDR_PC_G2(__PLTGOT(X))

The ADD instructions use the "modified immediate" encoding; an 8-bit immediate is rotated right by an even number of bits. For example 0xNN can be rotated to bits 27:20 with a rotate right of 12, or bits 19:12 by a rotate right of 20. The example in the ABI document uses the "Group Relocations"; these attempt to find the best sequence of modified immediates to represent an arbitrary constant. As the implementation of the group relocations is complex, I've chosen to follow gold and bfd's lead and hard code the rotations so that we can only represent constants of the form 0x0NNNNNNN by:

ADD ip, pc, #0x0NN00000
ADD ip, ip,  #0x000NN000
LDR pc, [ip, #0x00000NNN]!

Other design decisions:

  • I've chosen to padd out the entry to 16-bytes so that the header and the entries are all 16-byte aligned.
  • The previous PLT entries are now available under the --long-plt option.
  • If the offset from the .plt to the .got.plt cannot be encoded we give an error suggesting the use of the --long-plt option.
  • I've updated the existing tests to use --long-plt instead of updating all the tests. I'm quite happy to do this if people prefer.

An alternative implementation using the group relocations is possible. In some cases (when the offset between .plt and .got.plt has zeroes) we could get a longer range. This does make the implementation significantly more complex though. I'm happy to do this as a follow up if there is demand.

Diff Detail

Event Timeline

peter.smith created this revision.Dec 14 2017, 9:32 AM
pcc added a subscriber: pcc.Dec 14 2017, 10:20 AM

Instead of making the behaviour conditional on the flag, could we always emit the long PLT header and then conditionally emit short or long PLTs depending on what the offset is?

In D41246#955551, @pcc wrote:

Instead of making the behaviour conditional on the flag, could we always emit the long PLT header and then conditionally emit short or long PLTs depending on what the offset is?

That sounds like it should work given that the entries are the same size. It would mean the lazy PLT lookup would be a little bit slower, but that is probably a reasonable trade off. Unless anyone has any objections to the idea I'm happy to give that a go tomorrow.

ruiu added a comment.Dec 14 2017, 5:37 PM

That sounds like it should work given that the entries are the same size. It would mean the lazy PLT lookup would be a little bit slower, but that is probably a reasonable trade off. Unless anyone has any objections to the idea I'm happy to give that a go tomorrow.

I think pcc's suggestion is a good idea.

I've implemented the suggestion to choose the form of PLT entry based on whether the offset to the .got.plt can be represented by the shorter encoding. To maintain 16-byte alignment of the entries I've increased the size of the header to 32-bytes. As the --long-plt doesn't do anything any more I've updated all the tests to use the new encoding. Apologies for the size of the diff.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 18 2017, 6:47 AM
This revision was automatically updated to reflect the committed changes.