This is an archive of the discontinued LLVM Phabricator instance.

[ELF][AArch64] Support R_AARCH64_{CALL26,JUMP26} range extension thunks with addends
ClosedPublic

Authored by MaskRay on Nov 23 2019, 2:55 PM.

Details

Summary

Fixes AArch64 part of PR40438

The current range extension thunk framework does not handle a relocation
relative to a STT_SECTION symbol with a non-zero addend, which may be
used by jumps/calls to local functions on some RELA targets (AArch64,
powerpc ELFv1, powerpc64 ELFv2, etc). See PR40438 and the following
code for examples:

// clang -target $target a.cc
// .text.cold may be placed in a separate output section.
// The distance between bar in .text.cold and foo in .text may be larger than 128MiB.
static void foo() {}
__attribute__((section(".text.cold"))) static int bar() { foo(); return
0; }
__attribute__((used)) static int dummy = bar();

This patch makes such thunks with addends work for AArch64. The target
independent part can be reused by PPC in the future.

On REL targets (ARM, MIPS), jumps/calls are not represented as
STT_SECTION + non-zero addend (see
MCELFObjectTargetWriter::needsRelocateWithSymbol), so they don't need
this feature, but we need to make sure this patch does not affect them.

Event Timeline

MaskRay created this revision.Nov 23 2019, 2:55 PM

Will look at the code in detail this afternoon.

On REL targets (ARM, MIPS), jumps/calls are not represented as STT_SECTION + non-zero addend (see MCELFObjectTargetWriter::needsRelocateWithSymbol), so they don't need this feature, but we need to make sure this patch does not affect them.

On Arm using a STT_FUNC symbol is required to get an interworking thunk (LLD isn't strictly compliant here, see appendix) so using a branch to STT_SECTION symbol + offset may not work reliably unless the caller knows the destination execution state and can do the interworking at the call site. I've seen MIPS use the bottom bit of the symbol STO_MIPS_MICROMIPS but I don't know the details.

It is possible to write in assembler

bl sym + 8

Where the +8 will be added to the PC-bias correction (-8 in Arm and -4 in Thumb) to give an immediate of 0 in Arm or +4 in Thumb. So this is at least possible to encounter in Thumb. As far as I know compilers won't generate this though. This could be supported on Arm at least by reading the implicit addend rather than assuming it is either -8 in Arm or -4 in Thumb as we do now.

On ARM, the compensation of implicit addends is done in ARM::needsThunk. This makes it tricky not to affect ARM. I feel that the code can be generalized (see config->isRela introduced in the patch) if we reorganize the code (ARM::needsThunk).

Can you elaborate? My understanding is that needsThunk() and inBranchRange() assume all Arm branches to have an immediate (which becomes the relocation addend) of -8 and Thumb branches to have -4. This cancels the PC-bias of 8-bytes in Arm and 4-bytes in Thumb.

This is my current thinking. If the addend from the branch immediate, whether read from the instruction, or assumed to be -8 for an Arm branch or -4 for a Thumb branch, is passed in to needsThunk() I think that we could do:

src = branchAddr +  (Is Thumb branch then + 4, else +8) 
dst = (expr == R_PLT_PC) ? s.getPltVA(addend) : s.getVA(addend);

The code sequence:

// PC at Src is 2 instructions ahead, immediate of branch is signed
if (src > dst)
  range -= 2 * instrSize;
else
  range += instrSize;

if ((dst & 0x1) == 0)
  // Destination is ARM, if ARM caller then Src is already 4-byte aligned.
  // If Thumb Caller (BLX) the Src address has bottom 2 bits cleared to ensure
  // destination will be 4 byte aligned.
  src &= ~0x3;
else
  // Bit 0 == 1 denotes Thumb state, it is not part of the range
  dst &= ~0x1;

I think should be unaffected. This is accounting for the implicit PC-bias and the alignment requirements when calculating what immediates are encodable.

Appendix/Aside:
Strictly speaking Arm/Thumb interworking is only done for STT_FUNC symbols. LLD does this for all symbols as we have to choose between a BL or BLX in relocateOne which doesn't have access to Symbol to check the type. There are also some corner case rules to do with how the thumb bit is processed in relocations that we don't follow. Strictly, for a STT_FUNC the value of the symbol should strip out the Thumb bit before adding the addend, the Thumb bit is then orred in. For example some relocations are defined as ((S + A) | T) - P. This can matter when the addend is 1 modulo 2, which in practice never happens outside of assembly when someone is trying to manually interwork. I would like to fix this but it will mean adding Symbol, or at least the result of S.isFunc() as a parameter to relocateOne(). If that doesn't sound too objectionable I'd be happy to work on it.

I'm comfortable with the detailed code-changes. I initially thought that using a nested pair might make the code too difficult to read, but it is clear at the construction of the pair which field is which. I'd definitely want to avoid referring the elements of the pair with first.first or first.second, but as we only use it as a key it doesn't matter.

MaskRay updated this revision to Diff 230953.Nov 25 2019, 11:30 AM

Make code more general

Rebase on D70690

Can you elaborate? My understanding is that needsThunk() and inBranchRange() assume all Arm branches to have an immediate (which becomes the relocation addend) of -8 and Thumb branches to have -4. This cancels the PC-bias of 8-bytes in Arm and 4-bytes in Thumb.

This is my current thinking. If the addend from the branch immediate, whether read from the instruction, or assumed to be -8 for an Arm branch or -4 for a Thumb branch, is passed in to needsThunk() I think that we could do:

src = branchAddr + (Is Thumb branch then + 4, else +8)
dst = (expr == R_PLT_PC) ? s.getPltVA(addend) : s.getVA(addend);

Created D70690 as a prerequisite change to make this patch affect ARM even less.

The update in test/ELF/arm-thumb-mix-range-thunk-os.s is related to the following diff:

if (auto *d = dyn_cast<Defined>(rel.sym))
  if (!d->isInPlt() && d->section)
    thunkVec = &thunkedSymbolsBySectionAndAddend[{
        {d->section->repl, d->value}, rel.addend}];  // If I use 0 instead of rel.addend,
if (!thunkVec)
  thunkVec = &thunkedSymbols[{rel.sym, rel.addend}]; // If I use 0 instead of rel.addend,  test/ELF/arm-thumb-mix-range-thunk-os.s will not need an update.

The R_ARM_THM_CALL in tfunc33 causes creation of a Thumb thunk. The R_ARM_CALL in tfunc34 have different implicit addends, so making rel.addend part of the key will create an ARM thunk, rather than reuse the existing Thumb thunk. If reusing the thunk is the intended behavior, I'll change it to

// This ternary expression makes the code less general.
int64_t addend = config->emachine == EM_ARM ? 0 : rel.addend;

if (auto *d = dyn_cast<Defined>(rel.sym))
  if (!d->isInPlt() && d->section)
    thunkVec = &thunkedSymbolsBySectionAndAddend[{
        {d->section->repl, d->value}, addend}];
if (!thunkVec)
  thunkVec = &thunkedSymbols[{rel.sym, addend}];

can keep the test unchanged.

I would like to fix this but it will mean adding Symbol, or at least the result of S.isFunc() as a parameter to relocateOne(). If that doesn't sound too objectionable I'd be happy to work on it.

I support that we add Symbol to the parameter list of relocateOne. This can improve reportRangeError messages as well. Right now we don't report which symbol has the problem. With the Symbol information the message can be more specific.

Can you elaborate? My understanding is that needsThunk() and inBranchRange() assume all Arm branches to have an immediate (which becomes the relocation addend) of -8 and Thumb branches to have -4. This cancels the PC-bias of 8-bytes in Arm and 4-bytes in Thumb.

This is my current thinking. If the addend from the branch immediate, whether read from the instruction, or assumed to be -8 for an Arm branch or -4 for a Thumb branch, is passed in to needsThunk() I think that we could do:

src = branchAddr + (Is Thumb branch then + 4, else +8)
dst = (expr == R_PLT_PC) ? s.getPltVA(addend) : s.getVA(addend);

Created D70690 as a prerequisite change to make this patch affect ARM even less.

The update in test/ELF/arm-thumb-mix-range-thunk-os.s is related to the following diff:

if (auto *d = dyn_cast<Defined>(rel.sym))
  if (!d->isInPlt() && d->section)
    thunkVec = &thunkedSymbolsBySectionAndAddend[{
        {d->section->repl, d->value}, rel.addend}];  // If I use 0 instead of rel.addend,
if (!thunkVec)
  thunkVec = &thunkedSymbols[{rel.sym, rel.addend}]; // If I use 0 instead of rel.addend,  test/ELF/arm-thumb-mix-range-thunk-os.s will not need an update.

The R_ARM_THM_CALL in tfunc33 causes creation of a Thumb thunk. The R_ARM_CALL in tfunc34 have different implicit addends, so making rel.addend part of the key will create an ARM thunk, rather than reuse the existing Thumb thunk. If reusing the thunk is the intended behavior, I'll change it to

// This ternary expression makes the code less general.
int64_t addend = config->emachine == EM_ARM ? 0 : rel.addend;

if (auto *d = dyn_cast<Defined>(rel.sym))
  if (!d->isInPlt() && d->section)
    thunkVec = &thunkedSymbolsBySectionAndAddend[{
        {d->section->repl, d->value}, addend}];
if (!thunkVec)
  thunkVec = &thunkedSymbols[{rel.sym, addend}];

can keep the test unchanged.

Yes it is intentional, an Arm or Thumb BL instruction can be changed into a BLX instruction that changes state, so it is possible for both an Arm and Thumb relocation to share the same Thunk.

Instead of

// This ternary expression makes the code less general.
int64_t addend = config->emachine == EM_ARM ? 0 : rel.addend;

May I suggest

int64_t addend = rel.addend + getPCBias(type, rel.addend);

This could be virtual function for Target, but could just be static function that in effect cancels -4 or -8 to 0 on Arm, but still accounts for non -4 or -8 addends. The body would be something like:

  if (Config->EMachine != EM_ARM)
    return 0;
  switch (type) {
  case R_ARM_THM_JUMP19:
  case R_ARM_THM_JUMP24:
  case R_ARM_THM_CALL:
    return 4;
  default:
    return 8;
  }
}

This is more code, but it is a bit more general.

I would like to fix this but it will mean adding Symbol, or at least the result of S.isFunc() as a parameter to relocateOne(). If that doesn't sound too objectionable I'd be happy to work on it.

I support that we add Symbol to the parameter list of relocateOne. This can improve reportRangeError messages as well. Right now we don't report which symbol has the problem. With the Symbol information the message can be more specific.

MaskRay updated this revision to Diff 231099.Nov 26 2019, 10:18 AM
MaskRay edited the summary of this revision. (Show Details)
MaskRay removed a subscriber: wuzish.

Rebase after D70690 update (adopt peter.smith's suggestion that using getPCBias instead of inBranchRangeBeforePCBias)

peter.smith accepted this revision.Nov 27 2019, 6:40 AM

I'm happy with this change. It would be worth waiting a bit to see if there are any comments from other reviewers.

lld/ELF/Relocations.cpp
1822

Typo: addened -> addend

This revision is now accepted and ready to land.Nov 27 2019, 6:40 AM
MaskRay updated this revision to Diff 231282.Nov 27 2019, 9:20 AM
MaskRay marked an inline comment as done.

Fix a typo in the comment

This revision was automatically updated to reflect the committed changes.