Page MenuHomePhabricator

[AArch64][ELF][llvm-objdump] Add support for PLT decoding with BTI instructions present

Authored by peter.smith on May 29 2019, 8:16 AM.



Arm Architecture v8.5a introduces Branch Target Identification (BTI). When enabled all indirect branches must target a bti instruction of the appropriate form. As PLT sequences may sometimes be the target of an indirect branch and PLT[0] always is, a static linker may need to generate PLT sequences that contain "bti c" as the first instruction. In effect:

bti     c
adrp    x16, page offset to .got.plt

Instead of:

adrp    x16, page offset to .got.plt

At present the PLT decoding assumes the adrp will always be the first instruction. This patch adds support for a single optional "bti c" to prefix the adrp. A test binary has been uploaded with such a PLT sequence. The existing code already supports the PAC PLT sequence that adds an AUTIA1716 before the BR X17. A forthcoming LLD patch will make heavy use of the PLT decoding code.

The encoding for BTI c can be found in

Diff Detail


Event Timeline

peter.smith created this revision.May 29 2019, 8:16 AM
grimar added inline comments.May 30 2019, 2:17 AM
8 ↗(On Diff #201919)

Should we use yaml2obj instead of adding one more precompiled binary?

9 ↗(On Diff #201919)

Lets do not mix the // and #.

peter.smith marked 2 inline comments as done.May 30 2019, 2:35 AM

Thanks for the comments, will have an update later today.

8 ↗(On Diff #201919)

I'll give it a go. In theory I need to put a PLT section with instructions in so if I can do that with yaml2obj I'll do that.

9 ↗(On Diff #201919)

Thanks for the spot, will fix.

grimar added inline comments.May 30 2019, 2:44 AM
8 ↗(On Diff #201919)

Yes, you should be able to set any Content for section.
Just use obj2yaml on that binary, it should produce a workable yaml (which you might want to reduce though).

e.g. this is yaml produced from cfi.elf-aarch64 input also used in this test file:

- Name:            .plt
  Type:            SHT_PROGBITS
  Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
  Address:         0x00000000000002D8
  AddressAlign:    0x0000000000000008
  EntSize:         0x0000000000000010
  Content:         F07BBFA9F00000F011FE47F910E23F9120021FD61F2003D51F2003D51F2003D510010090110240F91002009120021FD6

Updated test to use yaml2obj. Couldn't get obj2yaml to work out of the box, but after heavily cutting down then both yaml2obj and llvm-objdump will accept it.

grimar added inline comments.May 31 2019, 1:58 AM
8 ↗(On Diff #202227)

You do not need --docnum=1, because there is only one document in the input.
I would probably inline it right here (we often do that in test cases).
Also, seems that having 2 plt entries and 2 symbols is enough and I was able to simplify the YAML a bit more.
What do you think about version below?

# RUN: llvm-objdump -d %p/Inputs/cfi.elf-aarch64 | FileCheck %s

# CHECK: Disassembly of section .plt:
# CHECK: __cfi_slowpath@plt:
# CHECK-NEXT: adrp      x16, {{.*}}
# CHECK: bl {{.*}} <__cfi_slowpath@plt>

# RUN: yaml2obj %s -o %t.aarch64
# RUN: llvm-objdump -d -mattr=+bti %t.aarch64 | \
# RUN:   FileCheck --check-prefix=CHECK-BTI %s
# CHECK-BTI: bl {{.*}} <f1@plt>
# CHECK-BTI: bl {{.*}} <f2@plt>
# CHECK-BTI: Disassembly of section .plt:
# CHECK-BTI: f1@plt:
# CHECK-BTI-NEXT: bti   c
# CHECK-BTI-NEXT: adrp  x16, {{.*}}
# CHECK-BTI: f2@plt:
# CHECK-BTI-NEXT: bti   c
# CHECK-BTI-NEXT: adrp  x16, {{.*}}

--- !ELF
  Class:   ELFCLASS64
  Data:    ELFDATA2LSB
  Type:    ET_EXEC
  Machine: EM_AARCH64
  - Name:    .rela.plt
    Type:    SHT_RELA
    Flags:   [ SHF_ALLOC ]
    EntSize: 0x0000000000000018
    Info:    .got.plt
      - Offset: 0x0000000000230018
        Symbol: f1
        Type:   R_AARCH64_JUMP_SLOT
      - Offset: 0x0000000000230020
        Symbol: f2
        Type:   R_AARCH64_JUMP_SLOT
  - Name:    .text
    Type:    SHT_PROGBITS
    Address: 0x0000000000210000
    Content: 0C00009411000094C0035FD6
  - Name:    .plt
    Type:    SHT_PROGBITS
    Address: 0x0000000000210010
    Content: 5F2403D5F07BBFA910010090110A40F91042009120021FD61F2003D51F2003D55F2403D510010090110E40F9106200919F2103D520021FD65F2403D510010090111240F9108200919F2103D520021FD6
  - Name:    .got.plt
    Type:    SHT_PROGBITS
    Content: '000000000000000000000000000000000000000000000000100021000000000010002100000000001000210000000000'
  - Name:    f1
    Type:    STT_FUNC
    Binding: STB_GLOBAL
  - Name:    f2
    Type:    STT_FUNC
    Binding: STB_GLOBAL

Thanks for the suggestion, I'll update on Monday.

MaskRay added a subscriber: MaskRay.Jun 1 2019, 5:42 AM
MaskRay added inline comments.
1 ↗(On Diff #202227)

--no-show-raw-insn to avoid bl {{.*}} <f1@plt> below.

MaskRay added inline comments.Jun 1 2019, 6:12 AM
171 ↗(On Diff #202227)

I'm not familiar with the encoding.. but does this mean other bit patterns (e.g. 0xffffffff) should also be accepted?

MaskRay removed a subscriber: MaskRay.
peter.smith marked 2 inline comments as done.Jun 3 2019, 4:14 AM
peter.smith added inline comments.
171 ↗(On Diff #202227)

Apologies I'm not sure I understand. I've put what I know in this comment. Please let me know if I've missed something?

At the moment I don't think any other bit-patterns are necessary. As I understand it the function will look for either:

bti c



To denote the start of a PLT entry. If neither of those is found then we'll skip all instructions till we find one. All but one type of AArch64 PLT entry follows this form. There is one other "pseudo" PLT entry type that can be generated by ld.bfd (but not LLD) for the lazy resolver of TLSDESC variables. If it is present there is always only one entry at the end of the PLT. It can be generated by the following:

extern __thread int val __attribute__((tls_model("global-dynamic")));

int func() {
    return val;
aarch64-linux-gnu-gcc  tls.c -O2 -fpic --shared -o

It is of the form.

  stp   x2, x3, [sp, #-16]!
  adrp  x2, DT_TLSDESC_GOT
  adrp  x3,  PLTGOT
  ldr   x2, [x2, :lo12:DT_TLSDESC_GOT]
  add   x3, x3, :lo12:PLTGOT
  br    x2

Note that gnu objdump doesn't seem to disassemble it as part of the PLT, so I've not added any support for it. I guess that could come as a separate patch if we wanted to.

I'm looking to match exactly one instruction BTI with exactly one type of operand "c" so the encoding can be precise.

The encoding for BTI c (quoting from DDI_0596_ARM_a64_instruction_set_architecture.pdf link in the description)

1 1 0 10 1 0 10 0 0 00 0 1 10 0 1 00 1 0 0x x 0 11 1 1 1
d50324x x 0 1f

where xx is:


For BTI c we'd want 0 1 0 1, which is 5.

Hope that answers the question?

1 ↗(On Diff #202227)

The regex here will be for the offset in the branch which --no-show-raw-insn doesn't affect. It comes out as something like bl 48 <f1@plt>. I've kept the regex for now as I'm not sure how stable the yaml2obj output will be.

Incorporated George's test case suggestions (Thank You!)

MaskRay added inline comments.Jun 3 2019, 8:09 AM
171 ↗(On Diff #202227)

Thanks for the explanation! Then I think this should probably be: (Inst | 0xc0) == 0xd50324df

(Insn & 0xd503245f) == 0xd503245f accepts many other irrelevant bit patterns like 0xffffffff.

peter.smith marked an inline comment as done.Jun 3 2019, 9:18 AM
peter.smith added inline comments.
171 ↗(On Diff #202227)

I see what you mean now, apologies I should have spotted that! I think that as we are only interested in "BTI c" which matches a single bit-pattern then a simple (Inst == 0xd50324df) will do. Will post updated patch in a sec.

Corrected matching of "BTI c" to only match "BTI c".

MaskRay accepted this revision.Jun 4 2019, 7:35 AM
MaskRay added inline comments.
171 ↗(On Diff #202227)

As I understand it, BTI c can be the target of an indirect call (blr reg) or an indirect jump(tail call) (br x16/x17). Can it be used as a general indirect jump target (br xn where n is neither 16 nor 17)? e.g. reused as a jump table target. If it is possible, then BTI jc may make sense.

If BTI jc doesn't make sense, this LGTM.

This revision is now accepted and ready to land.Jun 4 2019, 7:35 AM
peter.smith marked an inline comment as done.Jun 4 2019, 8:10 AM

Thanks for the review.

171 ↗(On Diff #202227)

FWIW I had to ask the same question to one of my colleagues. The J isn't required in this case for some non-obvious reasons. My best summary is:

  • A program loader sets a Guard value on the executable pages of an ELF file that claims BTI support GNU_FEATURE_AARCH64_FEATURE_1_BTI. For a program made up of an executable and shared-libraries there can be a mixture of support in each ELF file.
  • An indirect branch from a non-guarded page (no BTI support) will always succeed. So an indirect branch via another register is fine.
  • An indirect branch from a guarded page must use x17, x16 in order to be compatible with "BTI C". As a guarded page can only result from a relocatable object with BTI support, we know the compiler can respect that.

There is some more information in the comment for LLVM support for BTI D52867 about the meaning of the flags. The J is meant for jump tables only.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2019, 9:32 AM