Page MenuHomePhabricator

[IR] Define @llvm.ptrauth intrinsics.
Needs ReviewPublic

Authored by ab on Nov 5 2020, 10:47 AM.

Details

Summary

This defines the core @llvm.ptrauth. intrinsics: sign, auth, strip, blend, sign_generic. This also adds a docs/PointerAuth.md which goes into more detail; let me know if anything needs clarifying.

Most of the intrinsics are straightforward to define, except for blend which can be defined and implemented in various ways. To follow are straightforward codegen patches for sign, sign_generic, strip, and blend. auth and resign have a lot more complexity to them.

There are a couple open items for the long-term future. One would be to switch these to opaque pointer types instead of i64 (though i64 is really more accurate, and would hypothetically allow specialized usage on LP32 platforms, for instance).
Also, adding some more specific intrinsics might be useful for further hardening (e.g., an add-and-resign, or a way to check whether a pointer is correctly signed, without running into the llvm.ptrauth.auth UB and traps).
Finally, there are various cases where we need to treat an entire blend + sign/auth/resign sequence as a single operation, so we might want to embed the blend in all intrinsics (concretely, replacing the single i64 discriminators that the intrinsics take with a pair of i32 discriminator and i64 address discriminator - we already need to do that for the constants we use to express relocations).

For a high-level overview, see our llvm-dev RFC: http://lists.llvm.org/pipermail/llvm-dev/2019-October/136091.html, as well as the devmtg talk we did at the same time last year.
For concrete code that builds on this, see last year's staging PR in apple/llvm-project: https://github.com/apple/llvm-project/pull/14 (in particular, the higher level C/C++/Obj-C ABI usage is documented in the clang docs there). Though we've made changes downstream since then, the general concepts and added constructs are mostly identical.

Diff Detail

Unit TestsFailed

TimeTest
40 mslinux > Clang.CodeGen::lto-newpm-pipeline.c
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=full -O0 /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-FULL-O0
430 mslinux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp
710 mslinux > LLVM.Other::new-pass-manager.ll
Script: -- : 'RUN: at line 8'; /mnt/disks/ssd0/agent/llvm-project/build/bin/opt -disable-output -disable-verify -debug-pass-manager -passes=no-op-module /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pass-manager.ll 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/Other/new-pass-manager.ll --check-prefix=CHECK-MODULE-PASS
220 mswindows > Clang.APINotes::yaml-roundtrip.test
Script: -- : 'RUN: at line 1'; c:\ws\w64\llvm-project\premerge-checks\build\bin\apinotes-test.exe C:\ws\w64\llvm-project\premerge-checks\clang\test\APINotes/Inputs/Frameworks/Simple.framework/Headers/Simple.apinotes > C:\ws\w64\llvm-project\premerge-checks\build\tools\clang\test\APINotes\Output\yaml-roundtrip.test.tmp.result

Event Timeline

ab created this revision.Nov 5 2020, 10:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2020, 10:47 AM
ab requested review of this revision.Nov 5 2020, 10:47 AM
danielkiss added inline comments.Nov 11 2020, 4:49 PM
llvm/docs/PointerAuth.md
83

I'd call this parameter discriminator, for me it would more intuitive than "extra data".
e.g. llvm.ptrauth.blend takes two discriminators and returns a new one that should go here.

also later we say:

// Sign an unauthenticated pointer using the specified key and discriminator,
// passed in that order.

Architecture call's it modifier because it kind a modifies the key.

kristof.beyls added inline comments.Mar 8 2021, 2:47 AM
llvm/docs/PointerAuth.md
5–8

This is a long sentence, making it somewhat hard to parse/follow. Would it help to split it into shorter sentences? Maybe something like:

Pointer Authentication is a mechanism by which certain pointers are signed.
When a pointer gets signed, a cryptographic hash of its value and other values (pepper and salt) is stored in unused bits of that pointer.
Each time before the pointer is used, it is authenticated, i.e. has its signature checked.
This prevents pointer values of unknown origin from being injected into a process.
10–12

This sentence seems to be describing a specific ABI's choice of how which pointers to sign.
At the moment, my understanding is that there are 2 ABIs making use of the pointer authentication feature in the instruction set to sign/authenticate pointers:

  1. The -mbranch-protection=pac-ret scheme which only signs return addresses.
  2. The arm64e scheme which, IIUC, signs as described by the sentence above.

I think it may be better to make it more explicit that the above paragraph describes the arm64e abi specifically. Alternatively, maybe a more generic description could be given. Maybe something like the following?

Different ABIs or software targets may require a different set of pointers to be signed in specific ways. For example, -mbranch-protection=pac-ret signs return addresses only. The arm64e ABI signs most code pointers, such as function pointers and vtable entries. Furthermore, it also signs certain data pointers such as vtable pointers.
The ABI-prescribed signing of these pointers is generated automatically by the compiler.

This then naturally flows to the following paragraph which starts with "Additionally, with clang extensions, users can specify that a given pointer be signed/authenticated".

17–19

Maybe it reads a bit more naturally to make this a single sentence, since the bullet list is only 1 long? For example:

At the IR level, pointer signing and authentication is represented using a [set of intrinsics](#intrinsics)
53–58

Would it be helpful to refer to the key as being a cryptographic pepper (https://en.wikipedia.org/wiki/Pepper_(cryptography) ), since the discriminator is referred to as "salt"?

57

maybe say "pointer value" rather than "value" to remove potential ambiguity versus key value?

83

I wonder if it'd be better to call this keyid rather than key, since it is not the value of the key, but rather the id of the key that is passed?

83

I also wonder if it would be beneficial to make the name "value" more specific.
For example, here, this could be called "rawpointer"? Where a signed pointer is expected, instead of "value", "signedpointer" could be used? I think that would make the intrinsic a little bit more self-documenting.
Of course, this is bike shedding territory...

83

I agree with @danielkiss that I'd prefer a better name than "extra data". I think that yet another option that might work would be "salt", as the term cryptographic salt is already used above to explain the high-level semantics of these intrinsics.

135

I'm not entirely sure if we'd like to say behavior is undefined when the signature isn't valid.
If the "undefined" here means the same as "undefined behavior" in C/C++, wouldn't that allow the compiler to assume that it could never happen that the signature isn't valid, and e.g. optimize away signature checks based on that.
I assume that kind of behavior may be mostly theoretical at the moment, but it probably would be better to not enable elimination of signature checks by the compiler, even if only theoretical.
That being said, I'm not sure what the correct description should be for the behavior. For now, I can't come up with anything better than:

If ``value`` does not have a correct signature for ``key`` and ``extra data``,
the behavior is a target-specific side effect.
207

It seems counter-intuitive to me that resign would return an invalid poison pointer on invalid signature, whereas llvm.ptrauth.auth triggers undefined behaviour on invalid signature on the input signed pointer.
Wouldn't it be better to make the behavior consistent for both intrinsics when the signature of the signed pointer is invalid?

Before we get too far into editorial review, I think we should step back and ask what actually needs to be in this document. In particular, I'm not sure that the discussion of how pointer authentication can be used in an ABI is really appropriate for LLVM-level documentation. We should discuss the formal model we want the intrinsics/constant to provide — secret key registers, well-formed pointers, arbitrary discriminators — and just link to other documentation (e.g. the much longer white paper in the clang docs) for the benefit of people who are curious about how this can be used.

llvm/docs/PointerAuth.md
53–58

I think that's the best mapping onto the conventional terms, yeah. The correct constant/address discriminator for a particular signing purpose is publicly known, but it's supposed to be as different as possible for different purposes; that's basically a salt. The signing key is the same for all signatures (ignoring the different key registers), but it's secret and different for different "sites" (devices); that's basically a pepper. The nature of the problem is a little different, but it's close enough.

Before we get too far into editorial review, I think we should step back and ask what actually needs to be in this document. In particular, I'm not sure that the discussion of how pointer authentication can be used in an ABI is really appropriate for LLVM-level documentation. We should discuss the formal model we want the intrinsics/constant to provide — secret key registers, well-formed pointers, arbitrary discriminators — and just link to other documentation (e.g. the much longer white paper in the clang docs) for the benefit of people who are curious about how this can be used.

Good point, that makes sense to me. After a quick glance it seems there are only 2 short paragraphs that maybe need to be removed (or be replaced with pointers to the other documentation).