This is an archive of the discontinued LLVM Phabricator instance.

[IR] Define @llvm.ptrauth intrinsics.
ClosedPublic

Authored by ab on Nov 5 2020, 10:47 AM.

Details

Summary

This defines the core @llvm.ptrauth. intrinsics: sign, auth, strip, blend, sign_generic. This also adds a docs/PointerAuth.md which goes into more detail; let me know if anything needs clarifying.

Most of the intrinsics are straightforward to define, except for blend which can be defined and implemented in various ways. To follow are straightforward codegen patches for sign, sign_generic, strip, and blend. auth and resign have a lot more complexity to them.

There are a couple open items for the long-term future. One would be to switch these to opaque pointer types instead of i64 (though i64 is really more accurate, and would hypothetically allow specialized usage on LP32 platforms, for instance).
Also, adding some more specific intrinsics might be useful for further hardening (e.g., an add-and-resign, or a way to check whether a pointer is correctly signed, without running into the llvm.ptrauth.auth UB and traps).
Finally, there are various cases where we need to treat an entire blend + sign/auth/resign sequence as a single operation, so we might want to embed the blend in all intrinsics (concretely, replacing the single i64 discriminators that the intrinsics take with a pair of i32 discriminator and i64 address discriminator - we already need to do that for the constants we use to express relocations).

For a high-level overview, see our llvm-dev RFC: http://lists.llvm.org/pipermail/llvm-dev/2019-October/136091.html, as well as the devmtg talk we did at the same time last year.
For concrete code that builds on this, see last year's staging PR in apple/llvm-project: https://github.com/apple/llvm-project/pull/14 (in particular, the higher level C/C++/Obj-C ABI usage is documented in the clang docs there). Though we've made changes downstream since then, the general concepts and added constructs are mostly identical.

Diff Detail

Event Timeline

ab created this revision.Nov 5 2020, 10:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2020, 10:47 AM
ab requested review of this revision.Nov 5 2020, 10:47 AM
danielkiss added inline comments.Nov 11 2020, 4:49 PM
llvm/docs/PointerAuth.md
84

I'd call this parameter discriminator, for me it would more intuitive than "extra data".
e.g. llvm.ptrauth.blend takes two discriminators and returns a new one that should go here.

also later we say:

// Sign an unauthenticated pointer using the specified key and discriminator,
// passed in that order.

Architecture call's it modifier because it kind a modifies the key.

kristof.beyls added inline comments.Mar 8 2021, 2:47 AM
llvm/docs/PointerAuth.md
6–9

This is a long sentence, making it somewhat hard to parse/follow. Would it help to split it into shorter sentences? Maybe something like:

Pointer Authentication is a mechanism by which certain pointers are signed.
When a pointer gets signed, a cryptographic hash of its value and other values (pepper and salt) is stored in unused bits of that pointer.
Each time before the pointer is used, it is authenticated, i.e. has its signature checked.
This prevents pointer values of unknown origin from being injected into a process.
11–13

This sentence seems to be describing a specific ABI's choice of how which pointers to sign.
At the moment, my understanding is that there are 2 ABIs making use of the pointer authentication feature in the instruction set to sign/authenticate pointers:

  1. The -mbranch-protection=pac-ret scheme which only signs return addresses.
  2. The arm64e scheme which, IIUC, signs as described by the sentence above.

I think it may be better to make it more explicit that the above paragraph describes the arm64e abi specifically. Alternatively, maybe a more generic description could be given. Maybe something like the following?

Different ABIs or software targets may require a different set of pointers to be signed in specific ways. For example, -mbranch-protection=pac-ret signs return addresses only. The arm64e ABI signs most code pointers, such as function pointers and vtable entries. Furthermore, it also signs certain data pointers such as vtable pointers.
The ABI-prescribed signing of these pointers is generated automatically by the compiler.

This then naturally flows to the following paragraph which starts with "Additionally, with clang extensions, users can specify that a given pointer be signed/authenticated".

18–20

Maybe it reads a bit more naturally to make this a single sentence, since the bullet list is only 1 long? For example:

At the IR level, pointer signing and authentication is represented using a [set of intrinsics](#intrinsics)
54–59

Would it be helpful to refer to the key as being a cryptographic pepper (https://en.wikipedia.org/wiki/Pepper_(cryptography) ), since the discriminator is referred to as "salt"?

58

maybe say "pointer value" rather than "value" to remove potential ambiguity versus key value?

84

I wonder if it'd be better to call this keyid rather than key, since it is not the value of the key, but rather the id of the key that is passed?

84

I also wonder if it would be beneficial to make the name "value" more specific.
For example, here, this could be called "rawpointer"? Where a signed pointer is expected, instead of "value", "signedpointer" could be used? I think that would make the intrinsic a little bit more self-documenting.
Of course, this is bike shedding territory...

84

I agree with @danielkiss that I'd prefer a better name than "extra data". I think that yet another option that might work would be "salt", as the term cryptographic salt is already used above to explain the high-level semantics of these intrinsics.

136

I'm not entirely sure if we'd like to say behavior is undefined when the signature isn't valid.
If the "undefined" here means the same as "undefined behavior" in C/C++, wouldn't that allow the compiler to assume that it could never happen that the signature isn't valid, and e.g. optimize away signature checks based on that.
I assume that kind of behavior may be mostly theoretical at the moment, but it probably would be better to not enable elimination of signature checks by the compiler, even if only theoretical.
That being said, I'm not sure what the correct description should be for the behavior. For now, I can't come up with anything better than:

If ``value`` does not have a correct signature for ``key`` and ``extra data``,
the behavior is a target-specific side effect.
208

It seems counter-intuitive to me that resign would return an invalid poison pointer on invalid signature, whereas llvm.ptrauth.auth triggers undefined behaviour on invalid signature on the input signed pointer.
Wouldn't it be better to make the behavior consistent for both intrinsics when the signature of the signed pointer is invalid?

Before we get too far into editorial review, I think we should step back and ask what actually needs to be in this document. In particular, I'm not sure that the discussion of how pointer authentication can be used in an ABI is really appropriate for LLVM-level documentation. We should discuss the formal model we want the intrinsics/constant to provide — secret key registers, well-formed pointers, arbitrary discriminators — and just link to other documentation (e.g. the much longer white paper in the clang docs) for the benefit of people who are curious about how this can be used.

llvm/docs/PointerAuth.md
54–59

I think that's the best mapping onto the conventional terms, yeah. The correct constant/address discriminator for a particular signing purpose is publicly known, but it's supposed to be as different as possible for different purposes; that's basically a salt. The signing key is the same for all signatures (ignoring the different key registers), but it's secret and different for different "sites" (devices); that's basically a pepper. The nature of the problem is a little different, but it's close enough.

Before we get too far into editorial review, I think we should step back and ask what actually needs to be in this document. In particular, I'm not sure that the discussion of how pointer authentication can be used in an ABI is really appropriate for LLVM-level documentation. We should discuss the formal model we want the intrinsics/constant to provide — secret key registers, well-formed pointers, arbitrary discriminators — and just link to other documentation (e.g. the much longer white paper in the clang docs) for the benefit of people who are curious about how this can be used.

Good point, that makes sense to me. After a quick glance it seems there are only 2 short paragraphs that maybe need to be removed (or be replaced with pointers to the other documentation).

ab marked 9 inline comments as done.May 25 2021, 2:40 PM

Rewrote a good chunk of the document following reviews, responded inline to some. Thanks all for the comments!

One question: I realized this mentions the hardening constraints in a few isolated spots. Should we have a paragraph (say in "Concepts") that describes this upfront? Could be summed up as "intermediate values shouldn't be exposed", and the concrete ramifications of that for the IR design and backend impl.

llvm/docs/PointerAuth.md
18–20

Ah right, this looks weird because the list only includes the intrinsics that are defined here. The later patches add the other IR constructs (attributes, bundle, ...), each with another list item here

84

This all makes sense to me, I've removed most of the non-helpful distinctions between "extra data", "diversity" and "discriminator", to consistently use "discriminator".

I agree value is not super clear, I tried to be more specific in the argument descriptions (using "raw pointer" and "signed pointer"). I'm trying really hard to avoid "unsigned", but "signed" might be fine ;)

I've used "unauthenticated" interchangeably with "raw" before (to make it explicitly about ptrauth), but I think it's better to limit that to operations that don't authenticate (e.g., in the ptrauth_sign_unauthenticated clang builtin, as opposed to ptrauth_auth_and_resign)

105

Note that this is also described as "undefined", as in, "we assume it never happens", but I welcome any improvements to the wording.
However, this is less interesting than auth/resign, since there isn't much room for sensible behavior when signing an already-signed pointer, so we may not need to be very specific.

136

That's an interesting question. Since this was written, we've converged to making it a hard requirement for auth/resign operations to trap. In our implementation, there's currently still an escape hatch (via cl::opt and attribute) to disable that, but we're planning to get rid of that. Whether we want that upstream or not is an open question (really for all of you): it can helps with bringup, but with FPAC maybe we're better off always forcing the pre-FPAC codegen to have the SW check/trap.

Regarding:

If the "undefined" here means the same as "undefined behavior" in C/C++, wouldn't that allow the compiler to assume that it could never happen that the signature isn't valid, and e.g. optimize away signature checks based on that.

yes, it does allow that, and our implementation does optimize away auth/resigns with unused results.
Which brings up an interesting point: since we trap in the auth intrinsic itself, using it to check the signature is at best unergonomic (you'd have to do catch the trap somehow). So, for cases where the intention is only to check the signature of a pointer without using it, we've been using a different pattern: strip the original pointer, sign it with the expected scheme, and compare that with the original pointer. That's a little obscure (and has some hardening ramifications), so we've considered adding an intrinsic that does that. But it hasn't been a common pattern.

So, concretely, I'd be in favor of explicitly defining this (and resign) as trapping. "target-specific side effect" sounds reasonable and avoids over-specifying it as well, though I don't have a specific worry in mind with describing the trap.

169

Reading this again I think this "undefined" is less justifiable than the auth/resign ones. I'm expanding this a bit to explain the actual constraint, and describe this as target-specific. Depending on how exactly we define "target" this could be more than that, since it's dependent on the runtime OS setup. But I'm not sure that's helpful here.

208

It's indeed consistent, the doc is wrong ;) I will update the resign paragraph to match whatever we settle on for auth

ab updated this revision to Diff 347788.May 25 2021, 2:40 PM
ab marked 2 inline comments as done.
rjmccall added inline comments.May 25 2021, 11:16 PM
llvm/docs/PointerAuth.md
136

I think it would be good to reserve the right to not trap if the result is unused, but otherwise, yes, I agree that this should not have undefined behavior on invalidly-signed inputs.

zzheng added a subscriber: zzheng.Jun 14 2021, 8:56 AM
ab updated this revision to Diff 361678.Jul 26 2021, 8:42 AM

Rebase; describe auth/resign as side-effect-free (matching IntrNoMem) and mention they can be eliminated.

ab marked an inline comment as done.Jul 26 2021, 8:46 AM

Thanks all so far! Any other suggestions or comments?

llvm/docs/PointerAuth.md
136

I mentioned that they are still defined to be side-effect-free (matching their IntrNoMem definition in Intrinsics.td) and can be eliminated. How does that sound?

rjmccall added inline comments.Jul 26 2021, 2:20 PM
llvm/docs/PointerAuth.md
136

SGTM

pbarrio added inline comments.
llvm/docs/PointerAuth.md
18–20

Since at least part of the implementation is independent of the architecture, would it make sense to reword to something like: "The initial/current/original implementation leverages the [ARMv8.3 Pointer Authentication Code](#armv8-3-pointer-authentication-code)
instructions on the [AArch64 back-end](#aarch64-support), and follows the rules defined in the Darwin arm64e ABI". As it stands now, it gave me the impression that it is bespoke work for AArch64 and arm64e, when in fact it is actually quite independent and extensible to other architectures and ABIs (which is great BTW).

Also ok to leave it as-is, and only change it once support for other ABIs (or architectures) appears.

23

Maybe I'm jumping a bit ahead of myself here.

I was reading the fully-patched toolchain (e.g. here: https://github.com/pcc/llvm-project/tree/apple-pac3) and I noticed most of these concepts are explained in clang/docs/PointerAuthentication.rst in more detail. Should we have them only in one place and reference the other document? I really like how the Clang document explains the high-level workings of PAuth, and this document seems more focused on the internal implementation in LLVM. Ofc you may have other plans for the two-document split :)

For example, we could just point out the correspondence with the other document in the next section:

## LLVM IR Representation

### Intrinsics

The intrinsics implement the three fundamental operations in PAuth (sign, auth, and strip), as well as a bundle (resign), a generic data signing and an operation to help generate different types of discriminators (blend). For more information on PAuth operations, check out <link to Clang document Section "Basic concepts">.  Also, <link to Clang document Section "Discriminators"> explains discriminators in detail.

Or something like that.

I would expect anyone interested in this document to have read the other document first, or at least similar Arm documents, so they should already be familiar with how PAuth works.

289

Nit: replace ARMv8.3 by Armv8.3-A to follow Arm's trademark guidelines: https://www.arm.com/company/policies/trademarks/arm-trademark-list/arm-trademark. There are a few more occurrences of this in the patch.

pbarrio added inline comments.Jul 30 2021, 3:41 AM
llvm/docs/PointerAuth.md
282–283

What does target implementation mean here? is it the target triple combination? I assume something like this but then the next paragraph says "AArch64 is currently the only target...". Maybe worth defining what we mean by target, or replace the next paragraph's "target" by "architecture".

Matt added a subscriber: Matt.Aug 10 2021, 1:25 PM
ab updated this revision to Diff 382117.Oct 25 2021, 2:06 PM
ab marked an inline comment as done.
ab added a reviewer: bruno.

Rebased; updated docs following Pablo's comments.

ab marked 2 inline comments as done.Oct 25 2021, 2:41 PM
ab added inline comments.
llvm/docs/PointerAuth.md
23

Yep, the clang page comes after, but I added a link here in the later commit that introduces it.

Ahmed, thanks for rebasing this patch.
Shouldn't we add a reference for the AArch64 PAuth ELF ABI document: https://github.com/ARM-software/abi-aa/tree/main/pauthabielf64
I only see reference to Darwin arm64e ABI documentation.

Also, did we agree on a common prefix for pointer authentication?
I see 'ptrauth' in this patch, and in Apple's arm64e ABI for pointer authentication.
But AArch64 Pointer Authentication ABI extension to ELF refers to 'pauth'.
It will be less confusing to use the same acronym.

Can you highlight clearer what might be specific to Apple in these intrinsics declarations?
I mean, if we want to support the intrinsics for aarch64, now that we have the AArch64 Pointer Authentication ABI extension to ELF document, we should be able to reuse the intrinsics as they are defined now.

Well, we've been using ptrauth as a keyword/prefix/etc. to write software at Apple for more than four years, and we have tens of thousands of lines of code using that scattered across several dozen projects, which is probably at least half of the pointer authentication code in the world. I did specifically reach out to the ARM ELF group in the early days of that effort asking that they standardize on ptrauth instead of introducing yet another abbreviation for the extension, but they'd already made a file with "pauth" in the name, so now for better or worse I think we're stuck with having yet another abbreviation for the extension.

I agree we should link to the ARM ELF PAuth document and explicitly mention the different names. IIRC there's a bit already in the document which talks about the other names like "PAC".

ab updated this revision to Diff 383827.Nov 1 2021, 10:13 AM
ab marked 2 inline comments as done.

Link to ELF PAuth ABI Extension. Refer to ISA extension as "PAuth" (and clarify relationship with "Armv8.3-a" and "Pointer Authentication Code")

ab added a comment.Nov 1 2021, 10:26 AM

Shouldn't we add a reference for the AArch64 PAuth ELF ABI document: https://github.com/ARM-software/abi-aa/tree/main/pauthabielf64
I only see reference to Darwin arm64e ABI documentation.

I was thinking documentation about ELF support would be added at the same time as ELF support itself, the same way the paragraph about arm64e asm/mach-o extensions is added in later patches. But arm64e does have a mention here, so this does deserve a mention and a link as well. I'll let you folks add more detailed docs later, if needed.

Can you highlight clearer what might be specific to Apple in these intrinsics declarations?
I mean, if we want to support the intrinsics for aarch64, now that we have the AArch64 Pointer Authentication ABI extension to ELF document, we should be able to reuse the intrinsics as they are defined now.

I think the main bit that doesn't follow from the ISA is the implementation of @llvm.ptrauth.blend. I believe the ELF ABI does the same as arm64e, but there's always the possibility some other platform ends up with a different implementation. Either way, the intrinsic doesn't specify its implementation, so all the intrinsics defined here shouldn't be specific to the Darwin or ELF ABIs.
The concepts of "address"/"integer" "discriminators" (and thus "blend" itself) are also higher-level concepts that don't directly map to the ISA, but seem widely applicable to most usage of pointer authentication.

Also, did we agree on a common prefix for pointer authentication?
I see 'ptrauth' in this patch, and in Apple's arm64e ABI for pointer authentication.
But AArch64 Pointer Authentication ABI extension to ELF refers to 'pauth'.
It will be less confusing to use the same acronym.

Good point, I changed various references to Armv8.3-a/PAC to focus on "PAuth", since that's the current name of the ISA feature. Like John described, I don't think it's realistic to change "ptrauth" for our usage, and I would add that overall it seems okay that we'd have different nomenclature for different things, at the source level and in compilers, vs at lower levels (Darwin's "arm64e" is already quite different; not to mention all of our "System V" or "Itanium")

ab added a comment.Nov 11 2021, 9:01 AM

Hey folks! Any more comments?

thanks Ahmed for addressing the comments. LGTM.

danielkiss accepted this revision.Nov 11 2021, 12:01 PM

LGTM, I'm happy with the current state.

This revision is now accepted and ready to land.Nov 11 2021, 12:01 PM
This revision was landed with ongoing or failed builds.Nov 14 2021, 8:05 AM
This revision was automatically updated to reflect the committed changes.
ab added a comment.Nov 14 2021, 8:06 AM

Thanks all for the reviews! 68854f4e572a