This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ELF][AArch64] Support for BTI and PAC
ClosedPublic

Authored by peter.smith on May 29 2019, 9:49 AM.

Details

Summary

Branch Target Identification (BTI) and Pointer Authentication (PAC) are architecture features introduced in v8.5a and 8.3a respectively. The new instructions have been added in the hint space so that binaries take advantage of support where it exists yet still run on older hardware.

The impact of each feature is:

  • BTI: For executable pages that have been guarded, all indirect branches must have a destination that is a BTI instruction of the appropriate type. For the static linker, this means that PLT entries must have a "BTI c" as the first instruction in the sequence. BTI is an all or nothing property for a link unit, any indirect branch not landing on a valid destination will cause a Branch Target Exception.
  • PAC: The dynamic loader encodes with PACIA the address of the destination that the PLT entry will load from the .plt.got, placing the result in a subset of the top-bits that are not valid virtual addresses. The PLT entry may authenticate these top-bits using the AUTIA instruction before branching to the destination. Use of PAC in PLT sequences is a contract between the dynamic loader and the static linker, it is independent of whether the relocatable objects use PAC.

BTI and PAC are independent features that can be combined. So we can have several combinations of PLT:

  • Standard with no BTI or PAC
  • BTI PLT with "BTI c" as first instruction.
  • PAC PLT with "AUTIA1716" before the indirect branch to X17.
  • BTIPAC PLT with "BTI c" as first instruction and "AUTIA1716" before the first indirect branch to X17.

The use of BTI and PAC in relocatable object files are encoded by feature bits in the .note.gnu.property section in a similar way to Intel CET. There is one AArch64 specific program property GNU_PROPERTY_AARCH64_FEATURE_1_AND and two target feature bits defined:

  • GNU_PROPERTY_AARCH64_FEATURE_1_BTI
    • All executable sections are compatible with BTI.
  • GNU_PROPERTY_AARCH64_FEATURE_1_PAC
    • All executable sections have return address signing enabled.

Due to the properties of FEATURE_1_AND the static linker can tell when all input relocatable objects have the BTI and PAC feature bits set. The static linker uses this to enable the appropriate PLT sequence.

  • Neither -> standard PLT
  • GNU_PROPERTY_AARCH64_FEATURE_1_BTI -> BTI PLT
  • GNU_PROPERTY_AARCH64_FEATURE_1_PAC -> PAC PLT
  • Both properties -> BTIPAC PLT

In addition to the .note.gnu.properties there are two new command line options:
--force-bti : Act as if all relocatable inputs had GNU_PROPERTY_AARCH64_FEATURE_1_BTI and warn for every relocatable object that does not.
--pac-plt : Act as if all relocatable inputs had GNU_PROPERTY_AARCH64_FEATURE_1_PAC. As PAC is a contract between the loader and static linker no warning is given if it is not present in an input.

Two processor specific dynamic tags are used to communicate that a non standard PLT sequence is being used. DTI_AARCH64_BTI_PLT and DTI_AARCH64_BTI_PAC.

Implementation Notes:
Depends on llvm revisions for the properties, tags, llvm-readobj and llvm-objdump support D62595 D62596 D62598
Depends on lld revision D59780 for the .note.gnu.property implementation. I can remove the Intel CET code if it makes this easier to review?

For the changes to .note.gnu.property:

  • There were some test cases in ld.bfd whereupon a single relocatable object file could have more than on FEATURE_1_AND in the .note.gnu.property section. The result for the relocatable object file was to set the bit for the object if it was set in any of the FEATURE_1_AND properties. I've altered LLD to handle multiple FEATURE_1_AND properties in the same file and added some test cases for that.
  • There may be some merit in not merging the FEATURE_1_AND output into AndFeatures, instead we'd have X86Features and AArch64Features. This would make some of the if (Config->EMachine) checks go away but would mean more changes to the note parsing code.

For the command line options:

  • They have been implemented to match the ld.bfd options ([PATCH, BFD, LD, AArch64, 0/4] Add support for AArch64 BTI and PAC in the linker) note that the command line options changed over the patch review cycle.
  • The --force-bti option is intended for projects that may end up with assembler files without .note.gnu.property sections. A warning was chosen rather than an error to ease the initial porting effort for a project.
  • There is scope to implement a --require-bti option like --require-cet that gives an error rather than warning.

For the PLT sequences:

  • ld.bfd does not put "bti c" on the plt[N] entries when --pie is used. This is because ld.bfd can guarantee that the PLT address won't escape, however there is one small case in LLD where an ifunc has its address taken using a non-got reference when the PLT address can escape. There is potential for optimisation there, but at the moment we have to choose the PLT size early so it isn't easy to apply.

Diff Detail

Repository
rL LLVM

Event Timeline

peter.smith created this revision.May 29 2019, 9:49 AM
MaskRay added inline comments.May 30 2019, 3:32 AM
test/ELF/aarch64-feature-bti.s
12 ↗(On Diff #201955)

Consider llvm-readelf -x .got.plt

44 ↗(On Diff #201955)

Consider llvm-readelf -x .got.plt

test/ELF/aarch64-feature-btipac.s
104 ↗(On Diff #201955)

This insn is misaligned.

test/ELF/aarch64-feature-pac.s
10 ↗(On Diff #201955)

Consider llvm-readelf -x .got.plt

38 ↗(On Diff #201955)

You may use llvm-readelf -x .got.plt to be more specific with the section you intend to dump.

73 ↗(On Diff #201955)

Nit: misaligned (the nit is here just because you align such lines above..)

peter.smith marked 6 inline comments as done.May 30 2019, 9:58 AM

Thanks for the comments, I'll upload another diff with fixes.

test/ELF/aarch64-feature-bti.s
12 ↗(On Diff #201955)

Thanks for the suggestion, will do.

Updated tests to align instructions and use readelf -x .got.plt.

MaskRay added inline comments.Jun 1 2019, 5:22 AM
ELF/Driver.cpp
340 ↗(On Diff #202233)

What about adding

if (Config->EMachine != EM_AARCH64) {
  if (Config->PacPlt) error(...);
  if (Config->ForceBTI) error(...);
}

here? Then, below,

if (Config->EMachine == EM_AARCH64 && Config->ForceBTI &&

The Config->EMachine == EM_AARCH64 check can be removed.

docs/ld.lld.1
188 ↗(On Diff #202233)

.Fl is a mdoc macro that prepends a dash., so this should be`.It Fl -force-bti` (single dash).

189 ↗(On Diff #202233)

Use the full name GNU_PROPERTY_AARCH64_FEATURE_1_BTI?

390 ↗(On Diff #202233)

.It Fl -pac-plt

MaskRay added inline comments.Jun 1 2019, 6:58 AM
ELF/Arch/AArch64.cpp
537 ↗(On Diff #202233)

Should the condition be ANDed with !Config->Shared? Currently with --force-bti a shared object may

543 ↗(On Diff #202233)

Have you considered the alternative: rather than define 4 writePlt*, define a unified one:

if Bti
  write bti c
write adrp/ldr/add
if Pac
  write autiax16x17
write br x17
write padding nop(s)
568 ↗(On Diff #202233)

Is this autia1716?

587 ↗(On Diff #202233)

Is this autia1716?

617 ↗(On Diff #202233)

Config->AndFeatures hasn't been computed when getTargetInfo is called.

Target = getTarget();
...
link<ELF64LE>(Args);  // Config->AndFeatures is computed inside this function
ELF/Driver.cpp
1624 ↗(On Diff #202233)

not 2-space indented.

I have a question: does DT_AARCH64_BTI_PLT make sense for a shared object? If not, you may apply the following here:

if (!Config->Shared)
  Ret &= ~GNU_PROPERTY_AARCH64_FEATURE_1_BTI;

If yes, in getTargetInfo(), you may want to check !Config->Shared beside GNU_PROPERTY_AARCH64_FEATURE_1_BTI.

ELF/SyntheticSections.cpp
1348 ↗(On Diff #202233)

The ABI chooeses two DT_* tags (and with a value of 0! not 1) instead of a single DT_* with 2 bits seems a bit wasteful...

test/ELF/aarch64-feature-bti.s
8 ↗(On Diff #202233)

In binary utility tests, people seem to start using ## to highlight comments from RUN/CHECK lines..

11 ↗(On Diff #202233)

If the hex bytes in the dissassembly output are not important. consider --no-show-raw-insn

38 ↗(On Diff #202233)

doesn't escape -> doesn't escape a shared object

76 ↗(On Diff #202233)

.note.property -> .note.gnu.property

MaskRay added inline comments.Jun 1 2019, 7:04 AM
test/ELF/aarch64-feature-btipac.s
112 ↗(On Diff #202233)

--pac-plt doesn't warn - it doesn't require support in the object files for correctness.

MaskRay added a comment.EditedJun 1 2019, 7:32 AM

I am reading some binutils-gdb/bfd/elfnn-aarch64.c code as a reference. It doesn't use bti c for -pie (ET_DYN). <del>Is it intentional?</del>

It is intentional, because only indirect branch/call targets need BTI. A PIE PLT is not taken address so no BTI is needed.

static void
setup_plt_values (struct bfd_link_info *link_info,
		  aarch64_plt_type plt_type)
{
  struct elf_aarch64_link_hash_table *globals;
  globals = elf_aarch64_hash_table (link_info);

  if (plt_type == PLT_BTI_PAC)
    {
      globals->plt0_entry = elfNN_aarch64_small_plt0_bti_entry;

      /* Only in ET_EXEC we need PLTn with BTI.  */
      if (bfd_link_pde (link_info))
	{
	  globals->plt_entry_size = PLT_BTI_PAC_SMALL_ENTRY_SIZE;
	  globals->plt_entry = elfNN_aarch64_small_plt_bti_pac_entry;
	}
      else
	{
	  globals->plt_entry_size = PLT_PAC_SMALL_ENTRY_SIZE;
	  globals->plt_entry = elfNN_aarch64_small_plt_pac_entry;
	}
    }
  else if (plt_type == PLT_BTI)
    {
      globals->plt0_entry = elfNN_aarch64_small_plt0_bti_entry;

      /* Only in ET_EXEC we need PLTn with BTI.  */
      if (bfd_link_pde (link_info))
	{
	  globals->plt_entry_size = PLT_BTI_SMALL_ENTRY_SIZE;
	  globals->plt_entry = elfNN_aarch64_small_plt_bti_entry;
	}
    }
  else if (plt_type == PLT_PAC)
    {
      globals->plt_entry_size = PLT_PAC_SMALL_ENTRY_SIZE;
      globals->plt_entry = elfNN_aarch64_small_plt_pac_entry;
    }
}

I am reading some binutils-gdb/bfd/elfnn-aarch64.c code as a reference. It doesn't use bti c for -pie (ET_DYN). <del>Is it intentional?</del>

It is intentional, because only indirect branch/call targets need BTI. A PIE PLT is not taken address so no BTI is needed.

Yes this is intentional in BFD. Unfortunately clang can take the address of a non-preemptible ifunc with a non-got generating relocation with -fpie (case when the function is defined in the same object file so the compiler knows it won't be in the DSO), in this case LLD will make the PLT canonical (see Relocations.cpp // Handle a reference to a non-preemptible ifunc), and if a DSO also takes the address of the ifunc then the PLT address can leak to the DSO. I've put this as test/ELF/aarch64-ifunc-bti.s

There are some optimizations that could be performed, assuming an early scan of the relocations is not done for linker performance reasons:

  • Detect when the PLT address might escape and only put the "BTI c" at the head of the entry if it does. Note that we still need to use the larger size as we may have used it before the PLT entry size has been used.
  • Add a command line option that asserts that the program never indirect branches to an escaped PLT entry, or make the reverse assumption and the user has to say

I'm happy to do one of these either in this or a follow up patch.

MaskRay added inline comments.Jun 3 2019, 8:30 AM
ELF/Driver.cpp
340 ↗(On Diff #202233)

You may ignore the comment above. The Config->EMachine == EM_AARCH64 check below cannot be removed.

peter.smith marked 14 inline comments as done.Jun 3 2019, 8:40 AM

Thanks very much for the comments and questions. I'll upload a new patch in a second.

ELF/Arch/AArch64.cpp
537 ↗(On Diff #202233)

No. PLT[n] does an indirect branch "br x17" the destination of which will be PLT[0] in the case of lazy loading. I guess with -zbind-now this could be done.

Standard PLT entry for reference.

adrp x16, Page(&(.plt.got[n]))
ldr  x17, [x16, Offset(&(.plt.got[n]))]
add  x16, x16, Offset(&(.plt.got[n]))
br   x17  // Indirect branch, possibly to PLT[0]
543 ↗(On Diff #202233)

I did try that at first, the disadvantage, particularly for the BTI is that the offsets for the relocations change and it gets a bit messy. Not overly so, but more difficult to follow on first reading. I'm happy to do that if you'd prefer?

568 ↗(On Diff #202233)

Yes it is thanks. The 1716, means authorize x17 in the context of x16, my brain still wants to write 16 before 17.

587 ↗(On Diff #202233)

Yes it is thanks.

617 ↗(On Diff #202233)

At the moment getTarget() is called in two places. The first time will get the AArch64, the second will get AArch64BtiPac. I think that this is ok as the only difference is the PLT generation and that only matters later in the link. The same will be the case for the Intel CET patch D59780

Not sure if there is a good way around this yet.

ELF/Driver.cpp
340 ↗(On Diff #202233)

Good idea, thanks.

1624 ↗(On Diff #202233)

Fixed indentation, thanks for the spot.

We definitely need to propagate GNU_PROPERTY_AARCH64_FEATURE_1_BTI for shared objects, this communicates that all the input relocatable objects and any linker generated content has BTI instructions in the right place.

The PLT[0] sequence does change for shared objects as PLT[0] requires a BTI c (PLT[n] indirectly calls PLT[0]) so even for a Shared Object getTargetInfo() needs to do something different.

I guess it is possible that DT_AARCH64_BTI_PLT doesn't add a lot of value here although the wording in the ABI https://developer.arm.com/docs/ihi0056/latest/elf-for-the-arm-64-bit-architecture-aarch64-abi-2019q1-documentation requires it so I'd prefer to stick to that and ld.bfd.

ELF/SyntheticSections.cpp
1348 ↗(On Diff #202233)

Possibly although there is quite a range to choose from [0x70000000, 0x80000000). My limited understanding is that two were chosen as it is faster for a dynamic loader to check for the presence, rather than check the value as well although I don't know how significant this is in practice.

docs/ld.lld.1
188 ↗(On Diff #202233)

Thanks, for pointing that out, now fixed.

peter.smith marked 3 inline comments as done.

New patch to address review comments. Main changes:

  • Updated Driver.cpp to check that --pac-plt and --force-bti are aarch64 only, add test.
  • Correct number of dashes in docs
  • Fix some comments
  • Update tests to use ## for comments, use --no-show-raw-insn.
MaskRay added inline comments.Jun 3 2019, 7:12 PM
ELF/Arch/AArch64.cpp
543 ↗(On Diff #202233)

I do prefer a unified PLT as because this is a big chunk of repetition (4x). While reading the code, I couldn't help comparing the 4 functions just to find the difference is bti/autia1716. Putting them in one place may help.

But I'd like to wait for others to comment on this.

ELF/Driver.cpp
1607 ↗(On Diff #202731)

the a, typo?

ruiu added a comment.Jun 3 2019, 7:18 PM

Peter, do you want me to separate code to read AndFeatures from https://reviews.llvm.org/D59780 and submit?

peter.smith marked 3 inline comments as done.Jun 4 2019, 1:56 AM

Peter, do you want me to separate code to read AndFeatures from https://reviews.llvm.org/D59780 and submit?

Yes please, that would decouple this patch from the CET specific work.

ELF/Arch/AArch64.cpp
543 ↗(On Diff #202233)

I'll send an updated patch with a single unified PLT. If it is too complicated then we can always go back to the previous version.

ELF/Driver.cpp
1607 ↗(On Diff #202731)

Yes, thanks for spotting.

ruiu added a comment.Jun 4 2019, 1:58 AM

Peter, do you want me to separate code to read AndFeatures from https://reviews.llvm.org/D59780 and submit?

Yes please, that would decouple this patch from the CET specific work.

Sure, I'll send you a patch.

peter.smith marked an inline comment as done.

Updated diff with single unified PLT that optionally adds in the BTI prefix and AUTIA1716 before branch. This is less code overall as we don't have 4 similar PLT writing functions at the expense of being slightly more complicated.

Updated patch to rebase against parents D62853 and D62862

ruiu added inline comments.Jun 5 2019, 11:38 PM
ELF/Arch/AArch64.cpp
485–486 ↗(On Diff #202972)

They are used only twice. We inline all other instructions. Maybe we should inline them and remove these varaibles?

511 ↗(On Diff #202972)

I'm not sure whether it is intentional or a bug to not use BtiNeeded. If it's intentional, can you add a comment?

515 ↗(On Diff #202972)

Looks like you can add sizeof(BtiData) to Buf and remove Pre.

566 ↗(On Diff #202972)

Ditto -- do you need Pre? Pre is 0 only when !BtiNeeded, so you can use that condition instead.

ELF/Driver.cpp
1621 ↗(On Diff #202972)

It looks a bit odd that a --force option only warns and is not a hard error. Is this intentional?

peter.smith marked 6 inline comments as done.Jun 6 2019, 2:53 AM

Thanks for the comments I'll upload a new patch shortly.

ELF/Arch/AArch64.cpp
511 ↗(On Diff #202972)

I've split out BtiNeeded into BtiHeader and BtiEntry to make the intention explicit. In summary:

  • BtiHeader is always needed if the FEATURE_1_BTI bit is set.
  • BtiEntry is needed for non-shared libraries if the FEATURE_1_BTI bit is set.
  • PacEntry is always needed if the FEATURE_1_PAC bit is set.

There is no PacHeader as we don't need to alter the header file.

515 ↗(On Diff #202972)

I've updated the patch to do that. Pre also needs to be added to the Plt address so the getAArchPage(Plt + 8 + Pre) gets the right value.

ELF/Driver.cpp
1621 ↗(On Diff #202972)

Yes it is intentional (behaves the same way in ld.bfd). It is intended for early adopters/experimenters with a lot of assembler files or even binary objects that do not have the appropriate .note.gnu.property, yet they are able to declare them compatible with BTI (not many functions in assembly are called indirectly). For example we have some people experimenting with bringing up Android with BTI enabled and they preferred a warning so that they could get it up and running without having to track each last assembler file down. I think over time, when the hardware is closer and more projects have decided to use BTI an option like --require-bti could be added that would behave like --require-cet.

peter.smith marked an inline comment as done.

Updated diff to split BtiNeeded into BtiHeader and BtiEntry, and removed Pre in favour of incrementing Buf.

ruiu accepted this revision.Jun 7 2019, 2:55 AM

LGTM

Thank you for doing this! I can clearly see that this feature is very useful for defense-in-depth.

This revision is now accepted and ready to land.Jun 7 2019, 2:55 AM
MaskRay added inline comments.Jun 7 2019, 5:28 AM
ELF/InputFiles.cpp
975 ↗(On Diff #203318)

You may want to update this comment as well.

peter.smith marked an inline comment as done.Jun 7 2019, 5:46 AM
peter.smith added inline comments.
ELF/InputFiles.cpp
975 ↗(On Diff #203318)

Thanks I'll make it more generic and will do that prior to commit.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2019, 5:57 AM

Handle a reference to a non-preemptible ifunc), and if a DSO also takes the address of the ifunc then the PLT address can leak to the DSO. I've put this as test/ELF/aarch64-ifunc-bti.s

BtiEntry = BtiHeader && !Config->Shared;

This can be relaxed, but it will be a per-PLT property. Sym.NeedsPltAddr || Sym.IsInIplt. The problem is that writePlt doesn't pass the Symbol* argument to the target:

Target->writePlt(Buf + Off, Got, Plt, B->PltIndex, RelOff); // B itself is needed.