This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
-
LangRef.rst
20/26
PointerAuth.md
-
Reference.rst
-
include/llvm/IR/
-
llvm/
-
IR/
-
Intrinsics.td

Differential D90868

[IR] Define @llvm.ptrauth intrinsics.
ClosedPublic

Authored by ab on Nov 5 2020, 10:47 AM.

Download Raw Diff

Details

Reviewers

pcc
psmith
apazos
kristof.beyls
rjmccall
t.p.northover
chill
danielkiss
pbarrio
bruno

Commits

rG68854f4e572a: [IR] Define ptrauth intrinsics.

Summary

This defines the core @llvm.ptrauth. intrinsics: sign, auth, strip, blend, sign_generic. This also adds a docs/PointerAuth.md which goes into more detail; let me know if anything needs clarifying.

Most of the intrinsics are straightforward to define, except for blend which can be defined and implemented in various ways. To follow are straightforward codegen patches for sign, sign_generic, strip, and blend. auth and resign have a lot more complexity to them.

There are a couple open items for the long-term future. One would be to switch these to opaque pointer types instead of i64 (though i64 is really more accurate, and would hypothetically allow specialized usage on LP32 platforms, for instance).
Also, adding some more specific intrinsics might be useful for further hardening (e.g., an add-and-resign, or a way to check whether a pointer is correctly signed, without running into the llvm.ptrauth.auth UB and traps).
Finally, there are various cases where we need to treat an entire blend + sign/auth/resign sequence as a single operation, so we might want to embed the blend in all intrinsics (concretely, replacing the single i64 discriminators that the intrinsics take with a pair of i32 discriminator and i64 address discriminator - we already need to do that for the constants we use to express relocations).

For a high-level overview, see our llvm-dev RFC: http://lists.llvm.org/pipermail/llvm-dev/2019-October/136091.html, as well as the devmtg talk we did at the same time last year.
For concrete code that builds on this, see last year's staging PR in apple/llvm-project: https://github.com/apple/llvm-project/pull/14 (in particular, the higher level C/C++/Obj-C ABI usage is documented in the clang docs there). Though we've made changes downstream since then, the general concepts and added constructs are mostly identical.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ab created this revision.Nov 5 2020, 10:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2020, 10:47 AM

Herald added subscribers: pzheng, jdoerfert. · View Herald Transcript

ab requested review of this revision.Nov 5 2020, 10:47 AM

tschuett added a subscriber: tschuett.Nov 5 2020, 11:05 AM

Harbormaster completed remote builds in B77745: Diff 301675.Nov 5 2020, 12:44 PM

ab mentioned this in D91087: [AArch64] Select PAC/PACGA for ptrauth.sign/sign_generic..Nov 9 2020, 8:55 AM

ab added a child revision: D91087: [AArch64] Select PAC/PACGA for ptrauth.sign/sign_generic..

danielkiss added inline comments.Nov 11 2020, 4:49 PM

llvm/docs/PointerAuth.md
84	I'd call this parameter `discriminator`, for me it would more intuitive than "extra data". e.g. llvm.ptrauth.blend takes two `discriminators` and returns a new one that should go here. also later we say: // Sign an unauthenticated pointer using the specified key and discriminator, // passed in that order. Architecture call's it `modifier` because it kind a modifies the key.

tschuett mentioned this in D92834: [IR,AArch64] Add a new "ptrauth(...)" Constant to represent signed pointers.Dec 8 2020, 10:39 AM

danielkiss mentioned this in D98008: [AArch64][compiler-rt] Strip PAC from the link register..Mar 5 2021, 11:52 AM

kristof.beyls added inline comments.Mar 8 2021, 2:47 AM

llvm/docs/PointerAuth.md
6–9	This is a long sentence, making it somewhat hard to parse/follow. Would it help to split it into shorter sentences? Maybe something like: Pointer Authentication is a mechanism by which certain pointers are signed. When a pointer gets signed, a cryptographic hash of its value and other values (pepper and salt) is stored in unused bits of that pointer. Each time before the pointer is used, it is authenticated, i.e. has its signature checked. This prevents pointer values of unknown origin from being injected into a process.
11–13	This sentence seems to be describing a specific ABI's choice of how which pointers to sign. At the moment, my understanding is that there are 2 ABIs making use of the pointer authentication feature in the instruction set to sign/authenticate pointers: The -mbranch-protection=pac-ret scheme which only signs return addresses. The arm64e scheme which, IIUC, signs as described by the sentence above. I think it may be better to make it more explicit that the above paragraph describes the arm64e abi specifically. Alternatively, maybe a more generic description could be given. Maybe something like the following? Different ABIs or software targets may require a different set of pointers to be signed in specific ways. For example, -mbranch-protection=pac-ret signs return addresses only. The arm64e ABI signs most code pointers, such as function pointers and vtable entries. Furthermore, it also signs certain data pointers such as vtable pointers. The ABI-prescribed signing of these pointers is generated automatically by the compiler. This then naturally flows to the following paragraph which starts with "Additionally, with clang extensions, users can specify that a given pointer be signed/authenticated".
18–20	Maybe it reads a bit more naturally to make this a single sentence, since the bullet list is only 1 long? For example: At the IR level, pointer signing and authentication is represented using a [set of intrinsics](#intrinsics)
54–59	Would it be helpful to refer to the key as being a cryptographic pepper (https://en.wikipedia.org/wiki/Pepper_(cryptography) ), since the discriminator is referred to as "salt"?
58	maybe say "pointer value" rather than "value" to remove potential ambiguity versus key value?
84	I wonder if it'd be better to call this keyid rather than key, since it is not the value of the key, but rather the id of the key that is passed?
84	I also wonder if it would be beneficial to make the name "value" more specific. For example, here, this could be called "rawpointer"? Where a signed pointer is expected, instead of "value", "signedpointer" could be used? I think that would make the intrinsic a little bit more self-documenting. Of course, this is bike shedding territory...
84	I agree with @danielkiss that I'd prefer a better name than "extra data". I think that yet another option that might work would be "salt", as the term cryptographic salt is already used above to explain the high-level semantics of these intrinsics.
136	I'm not entirely sure if we'd like to say behavior is undefined when the signature isn't valid. If the "undefined" here means the same as "undefined behavior" in C/C++, wouldn't that allow the compiler to assume that it could never happen that the signature isn't valid, and e.g. optimize away signature checks based on that. I assume that kind of behavior may be mostly theoretical at the moment, but it probably would be better to not enable elimination of signature checks by the compiler, even if only theoretical. That being said, I'm not sure what the correct description should be for the behavior. For now, I can't come up with anything better than: If ``value`` does not have a correct signature for ``key`` and ``extra data``, the behavior is a target-specific side effect.
208	It seems counter-intuitive to me that resign would return an invalid poison pointer on invalid signature, whereas llvm.ptrauth.auth triggers undefined behaviour on invalid signature on the input signed pointer. Wouldn't it be better to make the behavior consistent for both intrinsics when the signature of the signed pointer is invalid?

Before we get too far into editorial review, I think we should step back and ask what actually needs to be in this document. In particular, I'm not sure that the discussion of how pointer authentication can be used in an ABI is really appropriate for LLVM-level documentation. We should discuss the formal model we want the intrinsics/constant to provide — secret key registers, well-formed pointers, arbitrary discriminators — and just link to other documentation (e.g. the much longer white paper in the clang docs) for the benefit of people who are curious about how this can be used.

llvm/docs/PointerAuth.md
54–59	I think that's the best mapping onto the conventional terms, yeah. The correct constant/address discriminator for a particular signing purpose is publicly known, but it's supposed to be as different as possible for different purposes; that's basically a salt. The signing key is the same for all signatures (ignoring the different key registers), but it's secret and different for different "sites" (devices); that's basically a pepper. The nature of the problem is a little different, but it's close enough.

In D90868#2613237, @rjmccall wrote:

Before we get too far into editorial review, I think we should step back and ask what actually needs to be in this document. In particular, I'm not sure that the discussion of how pointer authentication can be used in an ABI is really appropriate for LLVM-level documentation. We should discuss the formal model we want the intrinsics/constant to provide — secret key registers, well-formed pointers, arbitrary discriminators — and just link to other documentation (e.g. the much longer white paper in the clang docs) for the benefit of people who are curious about how this can be used.

Good point, that makes sense to me. After a quick glance it seems there are only 2 short paragraphs that maybe need to be removed (or be replaced with pointers to the other documentation).

Rewrote a good chunk of the document following reviews, responded inline to some. Thanks all for the comments!

One question: I realized this mentions the hardening constraints in a few isolated spots. Should we have a paragraph (say in "Concepts") that describes this upfront? Could be summed up as "intermediate values shouldn't be exposed", and the concrete ramifications of that for the IR design and backend impl.

llvm/docs/PointerAuth.md
18–20	Ah right, this looks weird because the list only includes the intrinsics that are defined here. The later patches add the other IR constructs (attributes, bundle, ...), each with another list item here
84	This all makes sense to me, I've removed most of the non-helpful distinctions between "extra data", "diversity" and "discriminator", to consistently use "discriminator". I agree value is not super clear, I tried to be more specific in the argument descriptions (using "raw pointer" and "signed pointer"). I'm trying really hard to avoid "unsigned", but "signed" might be fine ;) I've used "unauthenticated" interchangeably with "raw" before (to make it explicitly about ptrauth), but I think it's better to limit that to operations that don't authenticate (e.g., in the `ptrauth_sign_unauthenticated` clang builtin, as opposed to `ptrauth_auth_and_resign`)
105	Note that this is also described as "undefined", as in, "we assume it never happens", but I welcome any improvements to the wording. However, this is less interesting than auth/resign, since there isn't much room for sensible behavior when signing an already-signed pointer, so we may not need to be very specific.
136	That's an interesting question. Since this was written, we've converged to making it a hard requirement for auth/resign operations to trap. In our implementation, there's currently still an escape hatch (via cl::opt and attribute) to disable that, but we're planning to get rid of that. Whether we want that upstream or not is an open question (really for all of you): it can helps with bringup, but with FPAC maybe we're better off always forcing the pre-FPAC codegen to have the SW check/trap. Regarding: If the "undefined" here means the same as "undefined behavior" in C/C++, wouldn't that allow the compiler to assume that it could never happen that the signature isn't valid, and e.g. optimize away signature checks based on that. yes, it does allow that, and our implementation does optimize away auth/resigns with unused results. Which brings up an interesting point: since we trap in the auth intrinsic itself, using it to check the signature is at best unergonomic (you'd have to do catch the trap somehow). So, for cases where the intention is only to check the signature of a pointer without using it, we've been using a different pattern: strip the original pointer, sign it with the expected scheme, and compare that with the original pointer. That's a little obscure (and has some hardening ramifications), so we've considered adding an intrinsic that does that. But it hasn't been a common pattern. So, concretely, I'd be in favor of explicitly defining this (and `resign`) as trapping. "target-specific side effect" sounds reasonable and avoids over-specifying it as well, though I don't have a specific worry in mind with describing the trap.
169	Reading this again I think this "undefined" is less justifiable than the auth/resign ones. I'm expanding this a bit to explain the actual constraint, and describe this as target-specific. Depending on how exactly we define "target" this could be more than that, since it's dependent on the runtime OS setup. But I'm not sure that's helpful here.
208	It's indeed consistent, the doc is wrong ;) I will update the resign paragraph to match whatever we settle on for `auth`

ab updated this revision to Diff 347788.May 25 2021, 2:40 PM

ab marked 2 inline comments as done.

Harbormaster completed remote builds in B106160: Diff 347788.May 25 2021, 2:41 PM

rjmccall added inline comments.May 25 2021, 11:16 PM

llvm/docs/PointerAuth.md
136	I think it would be good to reserve the right to not trap if the result is unused, but otherwise, yes, I agree that this should not have undefined behavior on invalidly-signed inputs.

zzheng added a subscriber: zzheng.Jun 14 2021, 8:56 AM

Rebase; describe auth/resign as side-effect-free (matching IntrNoMem) and mention they can be eliminated.

Harbormaster completed remote builds in B116192: Diff 361678.Jul 26 2021, 8:43 AM

Thanks all so far! Any other suggestions or comments?

llvm/docs/PointerAuth.md
136	I mentioned that they are still defined to be side-effect-free (matching their IntrNoMem definition in Intrinsics.td) and can be eliminated. How does that sound?

rjmccall added inline comments.Jul 26 2021, 2:20 PM

llvm/docs/PointerAuth.md
136	SGTM

pbarrio added a subscriber: pbarrio.Jul 30 2021, 3:37 AM

pbarrio added inline comments.

llvm/docs/PointerAuth.md
18–20	Since at least part of the implementation is independent of the architecture, would it make sense to reword to something like: "The initial/current/original implementation leverages the [ARMv8.3 Pointer Authentication Code](#armv8-3-pointer-authentication-code) instructions on the [AArch64 back-end](#aarch64-support), and follows the rules defined in the Darwin arm64e ABI". As it stands now, it gave me the impression that it is bespoke work for AArch64 and arm64e, when in fact it is actually quite independent and extensible to other architectures and ABIs (which is great BTW). Also ok to leave it as-is, and only change it once support for other ABIs (or architectures) appears.
23	Maybe I'm jumping a bit ahead of myself here. I was reading the fully-patched toolchain (e.g. here: https://github.com/pcc/llvm-project/tree/apple-pac3) and I noticed most of these concepts are explained in clang/docs/PointerAuthentication.rst in more detail. Should we have them only in one place and reference the other document? I really like how the Clang document explains the high-level workings of PAuth, and this document seems more focused on the internal implementation in LLVM. Ofc you may have other plans for the two-document split :) For example, we could just point out the correspondence with the other document in the next section: ## LLVM IR Representation ### Intrinsics The intrinsics implement the three fundamental operations in PAuth (sign, auth, and strip), as well as a bundle (resign), a generic data signing and an operation to help generate different types of discriminators (blend). For more information on PAuth operations, check out <link to Clang document Section "Basic concepts">. Also, <link to Clang document Section "Discriminators"> explains discriminators in detail. Or something like that. I would expect anyone interested in this document to have read the other document first, or at least similar Arm documents, so they should already be familiar with how PAuth works.
289	Nit: replace ARMv8.3 by Armv8.3-A to follow Arm's trademark guidelines: https://www.arm.com/company/policies/trademarks/arm-trademark-list/arm-trademark. There are a few more occurrences of this in the patch.

pbarrio added inline comments.Jul 30 2021, 3:41 AM

llvm/docs/PointerAuth.md
282–283	What does target implementation mean here? is it the target triple combination? I assume something like this but then the next paragraph says "AArch64 is currently the only target...". Maybe worth defining what we mean by target, or replace the next paragraph's "target" by "architecture".

n.nerovny added a subscriber: n.nerovny.Aug 9 2021, 7:49 AM

Matt added a subscriber: Matt.Aug 10 2021, 1:25 PM

pbarrio added a reviewer: pbarrio.Aug 26 2021, 3:42 AM

Rebased; updated docs following Pablo's comments.

ab marked 2 inline comments as done.Oct 25 2021, 2:41 PM

ab added inline comments.

llvm/docs/PointerAuth.md
23	Yep, the clang page comes after, but I added a link here in the later commit that introduces it.

Harbormaster completed remote builds in B130551: Diff 382117.Oct 25 2021, 2:50 PM

Ahmed, thanks for rebasing this patch.
Shouldn't we add a reference for the AArch64 PAuth ELF ABI document: https://github.com/ARM-software/abi-aa/tree/main/pauthabielf64
I only see reference to Darwin arm64e ABI documentation.

Also, did we agree on a common prefix for pointer authentication?
I see 'ptrauth' in this patch, and in Apple's arm64e ABI for pointer authentication.
But AArch64 Pointer Authentication ABI extension to ELF refers to 'pauth'.
It will be less confusing to use the same acronym.

Can you highlight clearer what might be specific to Apple in these intrinsics declarations?
I mean, if we want to support the intrinsics for aarch64, now that we have the AArch64 Pointer Authentication ABI extension to ELF document, we should be able to reuse the intrinsics as they are defined now.

Well, we've been using ptrauth as a keyword/prefix/etc. to write software at Apple for more than four years, and we have tens of thousands of lines of code using that scattered across several dozen projects, which is probably at least half of the pointer authentication code in the world. I did specifically reach out to the ARM ELF group in the early days of that effort asking that they standardize on ptrauth instead of introducing yet another abbreviation for the extension, but they'd already made a file with "pauth" in the name, so now for better or worse I think we're stuck with having yet another abbreviation for the extension.

I agree we should link to the ARM ELF PAuth document and explicitly mention the different names. IIRC there's a bit already in the document which talks about the other names like "PAC".

Link to ELF PAuth ABI Extension. Refer to ISA extension as "PAuth" (and clarify relationship with "Armv8.3-a" and "Pointer Authentication Code")

In D90868#3097845, @apazos wrote:

Shouldn't we add a reference for the AArch64 PAuth ELF ABI document: https://github.com/ARM-software/abi-aa/tree/main/pauthabielf64
I only see reference to Darwin arm64e ABI documentation.

I was thinking documentation about ELF support would be added at the same time as ELF support itself, the same way the paragraph about arm64e asm/mach-o extensions is added in later patches. But arm64e does have a mention here, so this does deserve a mention and a link as well. I'll let you folks add more detailed docs later, if needed.

In D90868#3097884, @apazos wrote:

Can you highlight clearer what might be specific to Apple in these intrinsics declarations?
I mean, if we want to support the intrinsics for aarch64, now that we have the AArch64 Pointer Authentication ABI extension to ELF document, we should be able to reuse the intrinsics as they are defined now.

I think the main bit that doesn't follow from the ISA is the implementation of @llvm.ptrauth.blend. I believe the ELF ABI does the same as arm64e, but there's always the possibility some other platform ends up with a different implementation. Either way, the intrinsic doesn't specify its implementation, so all the intrinsics defined here shouldn't be specific to the Darwin or ELF ABIs.
The concepts of "address"/"integer" "discriminators" (and thus "blend" itself) are also higher-level concepts that don't directly map to the ISA, but seem widely applicable to most usage of pointer authentication.

In D90868#3097864, @apazos wrote:

Also, did we agree on a common prefix for pointer authentication?
I see 'ptrauth' in this patch, and in Apple's arm64e ABI for pointer authentication.
But AArch64 Pointer Authentication ABI extension to ELF refers to 'pauth'.
It will be less confusing to use the same acronym.

Good point, I changed various references to Armv8.3-a/PAC to focus on "PAuth", since that's the current name of the ISA feature. Like John described, I don't think it's realistic to change "ptrauth" for our usage, and I would add that overall it seems okay that we'd have different nomenclature for different things, at the source level and in compilers, vs at lower levels (Darwin's "arm64e" is already quite different; not to mention all of our "System V" or "Itanium")

Harbormaster completed remote builds in B131765: Diff 383827.Nov 1 2021, 10:52 AM

ab mentioned this in D112941: [clang] Add support for the new pointer authentication builtins..Nov 1 2021, 11:12 AM

ab added a child revision: D112941: [clang] Add support for the new pointer authentication builtins..

arichardson added a subscriber: arichardson.Nov 2 2021, 1:24 AM

Hey folks! Any more comments?

ab mentioned this in D113685: [IR] Define "ptrauth" operand bundle..Nov 11 2021, 9:08 AM

ab added a child revision: D113685: [IR] Define "ptrauth" operand bundle..

thanks Ahmed for addressing the comments. LGTM.

LGTM, I'm happy with the current state.

This revision is now accepted and ready to land.Nov 11 2021, 12:01 PM

This revision was landed with ongoing or failed builds.Nov 14 2021, 8:05 AM

Closed by commit rG68854f4e572a: [IR] Define ptrauth intrinsics. (authored by ab). · Explain Why

This revision was automatically updated to reflect the committed changes.

ab added a commit: rG68854f4e572a: [IR] Define ptrauth intrinsics..

Thanks all for the reviews! 68854f4e572a

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

7 lines

PointerAuth.md

260 lines

Reference.rst

5 lines

include/

llvm/

IR/

Intrinsics.td

55 lines

Diff 387095

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 17,610 Lines • ▼ Show 20 Lines

	Exception Handling Intrinsics			Exception Handling Intrinsics
	-----------------------------			-----------------------------

	The LLVM exception handling intrinsics (which all start with			The LLVM exception handling intrinsics (which all start with
	``llvm.eh.`` prefix), are described in the `LLVM Exception			``llvm.eh.`` prefix), are described in the `LLVM Exception
	Handling <ExceptionHandling.html#format-common-intrinsics>`_ document.			Handling <ExceptionHandling.html#format-common-intrinsics>`_ document.

				Pointer Authentication Intrinsics
				---------------------------------

				The LLVM pointer authentication intrinsics (which all start with
				``llvm.ptrauth.`` prefix), are described in the `Pointer Authentication
				<PointerAuth.html#intrinsics>`_ document.

	.. _int_trampoline:			.. _int_trampoline:

	Trampoline Intrinsics			Trampoline Intrinsics
	---------------------			---------------------

	These intrinsics make it possible to excise one parameter, marked with			These intrinsics make it possible to excise one parameter, marked with
	the :ref:`nest <nest>` attribute, from a function. The result is a			the :ref:`nest <nest>` attribute, from a function. The result is a
	callable function pointer lacking the nest parameter - the caller does			callable function pointer lacking the nest parameter - the caller does
	▲ Show 20 Lines • Show All 6,208 Lines • Show Last 20 Lines

llvm/docs/PointerAuth.md

This file was added.

				# Pointer Authentication

				## Introduction

				Pointer Authentication is a mechanism by which certain pointers are signed.
				When a pointer gets signed, a cryptographic hash of its value and other values
				(pepper and salt) is stored in unused bits of that pointer.

				Before the pointer is used, it needs to be authenticated, i.e., have its
				kristof.beylsUnsubmitted Done Reply Inline Actions This is a long sentence, making it somewhat hard to parse/follow. Would it help to split it into shorter sentences? Maybe something like: Pointer Authentication is a mechanism by which certain pointers are signed. When a pointer gets signed, a cryptographic hash of its value and other values (pepper and salt) is stored in unused bits of that pointer. Each time before the pointer is used, it is authenticated, i.e. has its signature checked. This prevents pointer values of unknown origin from being injected into a process. kristof.beyls: This is a long sentence, making it somewhat hard to parse/follow. Would it help to split it…
				signature checked. This prevents pointer values of unknown origin from being
				used to replace the signed pointer value.

				At the IR level, it is represented using a [set of intrinsics](#intrinsics)
				kristof.beylsUnsubmitted Not Done Reply Inline Actions This sentence seems to be describing a specific ABI's choice of how which pointers to sign. At the moment, my understanding is that there are 2 ABIs making use of the pointer authentication feature in the instruction set to sign/authenticate pointers: The -mbranch-protection=pac-ret scheme which only signs return addresses. The arm64e scheme which, IIUC, signs as described by the sentence above. I think it may be better to make it more explicit that the above paragraph describes the arm64e abi specifically. Alternatively, maybe a more generic description could be given. Maybe something like the following? Different ABIs or software targets may require a different set of pointers to be signed in specific ways. For example, -mbranch-protection=pac-ret signs return addresses only. The arm64e ABI signs most code pointers, such as function pointers and vtable entries. Furthermore, it also signs certain data pointers such as vtable pointers. The ABI-prescribed signing of these pointers is generated automatically by the compiler. This then naturally flows to the following paragraph which starts with "Additionally, with clang extensions, users can specify that a given pointer be signed/authenticated". kristof.beyls: This sentence seems to be describing a specific ABI's choice of how which pointers to sign. At…
				(to sign/authenticate pointers).

				The current implementation leverages the
				[Armv8.3-A PAuth/Pointer Authentication Code](#armv8-3-a-pauth-pointer-authentication-code)
				instructions in the [AArch64 backend](#aarch64-support).
				This support is used to implement the Darwin arm64e ABI, as well as the
				[PAuth ABI Extension to ELF](https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst).
				kristof.beylsUnsubmitted Not Done Reply Inline Actions Maybe it reads a bit more naturally to make this a single sentence, since the bullet list is only 1 long? For example: At the IR level, pointer signing and authentication is represented using a [set of intrinsics](#intrinsics) kristof.beyls: Maybe it reads a bit more naturally to make this a single sentence, since the bullet list is…
				abAuthorUnsubmitted Done Reply Inline Actions Ah right, this looks weird because the list only includes the intrinsics that are defined here. The later patches add the other IR constructs (attributes, bundle, ...), each with another list item here ab: Ah right, this looks weird because the list only includes the intrinsics that are defined here.
				pbarrioUnsubmitted Done Reply Inline Actions Since at least part of the implementation is independent of the architecture, would it make sense to reword to something like: "The initial/current/original implementation leverages the [ARMv8.3 Pointer Authentication Code](#armv8-3-pointer-authentication-code) instructions on the [AArch64 back-end](#aarch64-support), and follows the rules defined in the Darwin arm64e ABI". As it stands now, it gave me the impression that it is bespoke work for AArch64 and arm64e, when in fact it is actually quite independent and extensible to other architectures and ABIs (which is great BTW). Also ok to leave it as-is, and only change it once support for other ABIs (or architectures) appears. pbarrio: Since at least part of the implementation is independent of the architecture, would it make…


				## LLVM IR Representation
				pbarrioUnsubmitted Done Reply Inline Actions Maybe I'm jumping a bit ahead of myself here. I was reading the fully-patched toolchain (e.g. here: https://github.com/pcc/llvm-project/tree/apple-pac3) and I noticed most of these concepts are explained in clang/docs/PointerAuthentication.rst in more detail. Should we have them only in one place and reference the other document? I really like how the Clang document explains the high-level workings of PAuth, and this document seems more focused on the internal implementation in LLVM. Ofc you may have other plans for the two-document split :) For example, we could just point out the correspondence with the other document in the next section: ## LLVM IR Representation ### Intrinsics The intrinsics implement the three fundamental operations in PAuth (sign, auth, and strip), as well as a bundle (resign), a generic data signing and an operation to help generate different types of discriminators (blend). For more information on PAuth operations, check out <link to Clang document Section "Basic concepts">. Also, <link to Clang document Section "Discriminators"> explains discriminators in detail. Or something like that. I would expect anyone interested in this document to have read the other document first, or at least similar Arm documents, so they should already be familiar with how PAuth works. pbarrio: Maybe I'm jumping a bit ahead of myself here. I was reading the fully-patched toolchain (e.g.
				abAuthorUnsubmitted Done Reply Inline Actions Yep, the clang page comes after, but I added a link here in the later commit that introduces it. ab: Yep, the clang page comes after, but I added a link here in the later commit that introduces it.

				### Intrinsics

				These intrinsics are provided by LLVM to expose pointer authentication
				operations.


				#### '``llvm.ptrauth.sign``'

				##### Syntax:

				```llvm
				declare i64 @llvm.ptrauth.sign(i64 <value>, i32 <key>, i64 <discriminator>)
				```

				##### Overview:

				The '``llvm.ptrauth.sign``' intrinsic signs a raw pointer.


				##### Arguments:

				The ``value`` argument is the raw pointer value to be signed.
				The ``key`` argument is the identifier of the key to be used to generate the
				signed value.
				The ``discriminator`` argument is the additional diversity data to be used as a
				discriminator (an integer, an address, or a blend of the two).

				##### Semantics:

				The '``llvm.ptrauth.sign``' intrinsic implements the `sign`_ operation.
				It returns a signed value.

				If ``value`` is already a signed value, the behavior is undefined.

				kristof.beylsUnsubmitted Done Reply Inline Actions maybe say "pointer value" rather than "value" to remove potential ambiguity versus key value? kristof.beyls: maybe say "pointer value" rather than "value" to remove potential ambiguity versus key value?
				If ``value`` is not a pointer value for which ``key`` is appropriate, the
				kristof.beylsUnsubmitted Done Reply Inline Actions Would it be helpful to refer to the key as being a cryptographic pepper (https://en.wikipedia.org/wiki/Pepper_(cryptography) ), since the discriminator is referred to as "salt"? kristof.beyls: Would it be helpful to refer to the key as being a cryptographic pepper (https://en.wikipedia.
				rjmccallUnsubmitted Done Reply Inline Actions I think that's the best mapping onto the conventional terms, yeah. The correct constant/address discriminator for a particular signing purpose is publicly known, but it's supposed to be as different as possible for different purposes; that's basically a salt. The signing key is the same for all signatures (ignoring the different key registers), but it's secret and different for different "sites" (devices); that's basically a pepper. The nature of the problem is a little different, but it's close enough. rjmccall: I think that's the best mapping onto the conventional terms, yeah. The correct…
				behavior is undefined.


				#### '``llvm.ptrauth.auth``'

				##### Syntax:

				```llvm
				declare i64 @llvm.ptrauth.auth(i64 <value>, i32 <key>, i64 <discriminator>)
				```

				##### Overview:

				The '``llvm.ptrauth.auth``' intrinsic authenticates a signed pointer.

				##### Arguments:

				The ``value`` argument is the signed pointer value to be authenticated.
				The ``key`` argument is the identifier of the key that was used to generate
				the signed value.
				The ``discriminator`` argument is the additional diversity data to be used as a
				discriminator.

				##### Semantics:

				danielkissUnsubmitted Done Reply Inline Actions I'd call this parameter `discriminator`, for me it would more intuitive than "extra data". e.g. llvm.ptrauth.blend takes two `discriminators` and returns a new one that should go here. also later we say: // Sign an unauthenticated pointer using the specified key and discriminator, // passed in that order. Architecture call's it `modifier` because it kind a modifies the key. danielkiss: I'd call this parameter `discriminator`, for me it would more intuitive than "extra data". e.g.
				kristof.beylsUnsubmitted Done Reply Inline Actions I agree with @danielkiss that I'd prefer a better name than "extra data". I think that yet another option that might work would be "salt", as the term cryptographic salt is already used above to explain the high-level semantics of these intrinsics. kristof.beyls: I agree with @danielkiss that I'd prefer a better name than "extra data". I think that yet…
				kristof.beylsUnsubmitted Done Reply Inline Actions I wonder if it'd be better to call this keyid rather than key, since it is not the value of the key, but rather the id of the key that is passed? kristof.beyls: I wonder if it'd be better to call this keyid rather than key, since it is not the value of the…
				kristof.beylsUnsubmitted Done Reply Inline Actions I also wonder if it would be beneficial to make the name "value" more specific. For example, here, this could be called "rawpointer"? Where a signed pointer is expected, instead of "value", "signedpointer" could be used? I think that would make the intrinsic a little bit more self-documenting. Of course, this is bike shedding territory... kristof.beyls: I also wonder if it would be beneficial to make the name "value" more specific. For example…
				abAuthorUnsubmitted Done Reply Inline Actions This all makes sense to me, I've removed most of the non-helpful distinctions between "extra data", "diversity" and "discriminator", to consistently use "discriminator". I agree value is not super clear, I tried to be more specific in the argument descriptions (using "raw pointer" and "signed pointer"). I'm trying really hard to avoid "unsigned", but "signed" might be fine ;) I've used "unauthenticated" interchangeably with "raw" before (to make it explicitly about ptrauth), but I think it's better to limit that to operations that don't authenticate (e.g., in the `ptrauth_sign_unauthenticated` clang builtin, as opposed to `ptrauth_auth_and_resign`) ab: This all makes sense to me, I've removed most of the non-helpful distinctions between "extra…
				The '``llvm.ptrauth.auth``' intrinsic implements the `auth`_ operation.
				It returns a raw pointer value.
				If ``value`` does not have a correct signature for ``key`` and ``discriminator``,
				the intrinsic traps in a target-specific way.


				#### '``llvm.ptrauth.strip``'

				##### Syntax:

				```llvm
				declare i64 @llvm.ptrauth.strip(i64 <value>, i32 <key>)
				```

				##### Overview:

				The '``llvm.ptrauth.strip``' intrinsic strips the embedded signature out of a
				possibly-signed pointer.


				##### Arguments:
				abAuthorUnsubmitted Not Done Reply Inline Actions Note that this is also described as "undefined", as in, "we assume it never happens", but I welcome any improvements to the wording. However, this is less interesting than auth/resign, since there isn't much room for sensible behavior when signing an already-signed pointer, so we may not need to be very specific. ab: Note that this is also described as "undefined", as in, "we assume it never happens", but I…

				The ``value`` argument is the signed pointer value to be stripped.
				The ``key`` argument is the identifier of the key that was used to generate
				the signed value.

				##### Semantics:

				The '``llvm.ptrauth.strip``' intrinsic implements the `strip`_ operation.
				It returns a raw pointer value. It does not check that the
				signature is valid.

				``key`` should identify a key that is appropriate for ``value``, as defined
				by the target-specific [keys](#key)).

				If ``value`` is a raw pointer value, it is returned as-is (provided the ``key``
				is appropriate for the pointer).

				If ``value`` is not a pointer value for which ``key`` is appropriate, the
				behavior is target-specific.

				If ``value`` is a signed pointer value, but ``key`` does not identify the
				same key that was used to generate ``value``, the behavior is
				target-specific.


				#### '``llvm.ptrauth.resign``'

				##### Syntax:

				```llvm
				declare i64 @llvm.ptrauth.resign(i64 <value>,
				kristof.beylsUnsubmitted Not Done Reply Inline Actions I'm not entirely sure if we'd like to say behavior is undefined when the signature isn't valid. If the "undefined" here means the same as "undefined behavior" in C/C++, wouldn't that allow the compiler to assume that it could never happen that the signature isn't valid, and e.g. optimize away signature checks based on that. I assume that kind of behavior may be mostly theoretical at the moment, but it probably would be better to not enable elimination of signature checks by the compiler, even if only theoretical. That being said, I'm not sure what the correct description should be for the behavior. For now, I can't come up with anything better than: If ``value`` does not have a correct signature for ``key`` and ``extra data``, the behavior is a target-specific side effect. kristof.beyls: I'm not entirely sure if we'd like to say behavior is undefined when the signature isn't valid.
				abAuthorUnsubmitted Not Done Reply Inline Actions That's an interesting question. Since this was written, we've converged to making it a hard requirement for auth/resign operations to trap. In our implementation, there's currently still an escape hatch (via cl::opt and attribute) to disable that, but we're planning to get rid of that. Whether we want that upstream or not is an open question (really for all of you): it can helps with bringup, but with FPAC maybe we're better off always forcing the pre-FPAC codegen to have the SW check/trap. Regarding: If the "undefined" here means the same as "undefined behavior" in C/C++, wouldn't that allow the compiler to assume that it could never happen that the signature isn't valid, and e.g. optimize away signature checks based on that. yes, it does allow that, and our implementation does optimize away auth/resigns with unused results. Which brings up an interesting point: since we trap in the auth intrinsic itself, using it to check the signature is at best unergonomic (you'd have to do catch the trap somehow). So, for cases where the intention is only to check the signature of a pointer without using it, we've been using a different pattern: strip the original pointer, sign it with the expected scheme, and compare that with the original pointer. That's a little obscure (and has some hardening ramifications), so we've considered adding an intrinsic that does that. But it hasn't been a common pattern. So, concretely, I'd be in favor of explicitly defining this (and `resign`) as trapping. "target-specific side effect" sounds reasonable and avoids over-specifying it as well, though I don't have a specific worry in mind with describing the trap. ab: That's an interesting question. Since this was written, we've converged to making it a hard…
				rjmccallUnsubmitted Done Reply Inline Actions I think it would be good to reserve the right to not trap if the result is unused, but otherwise, yes, I agree that this should not have undefined behavior on invalidly-signed inputs. rjmccall: I think it would be good to reserve the right to not trap if the result is unused, but…
				abAuthorUnsubmitted Done Reply Inline Actions I mentioned that they are still defined to be side-effect-free (matching their IntrNoMem definition in Intrinsics.td) and can be eliminated. How does that sound? ab: I mentioned that they are still defined to be side-effect-free (matching their IntrNoMem…
				rjmccallUnsubmitted Not Done Reply Inline Actions SGTM rjmccall: SGTM
				i32 <old key>, i64 <old discriminator>,
				i32 <new key>, i64 <new discriminator>)
				```

				##### Overview:

				The '``llvm.ptrauth.resign``' intrinsic re-signs a signed pointer using
				a different key and diversity data.

				##### Arguments:

				The ``value`` argument is the signed pointer value to be authenticated.
				The ``old key`` argument is the identifier of the key that was used to generate
				the signed value.
				The ``old discriminator`` argument is the additional diversity data to be used
				as a discriminator in the auth operation.
				The ``new key`` argument is the identifier of the key to use to generate the
				resigned value.
				The ``new discriminator`` argument is the additional diversity data to be used
				as a discriminator in the sign operation.

				##### Semantics:

				The '``llvm.ptrauth.resign``' intrinsic performs a combined `auth`_ and `sign`_
				operation, without exposing the intermediate raw pointer.
				It returns a signed pointer value.
				If ``value`` does not have a correct signature for ``old key`` and
				``old discriminator``, the intrinsic traps in a target-specific way.

				#### '``llvm.ptrauth.sign_generic``'

				##### Syntax:

				abAuthorUnsubmitted Done Reply Inline Actions Reading this again I think this "undefined" is less justifiable than the auth/resign ones. I'm expanding this a bit to explain the actual constraint, and describe this as target-specific. Depending on how exactly we define "target" this could be more than that, since it's dependent on the runtime OS setup. But I'm not sure that's helpful here. ab: Reading this again I think this "undefined" is less justifiable than the auth/resign ones. I'm…
				```llvm
				declare i64 @llvm.ptrauth.sign_generic(i64 <value>, i64 <discriminator>)
				```

				##### Overview:

				The '``llvm.ptrauth.sign_generic``' intrinsic computes a generic signature of
				arbitrary data.

				##### Arguments:

				The ``value`` argument is the arbitrary data value to be signed.
				The ``discriminator`` argument is the additional diversity data to be used as a
				discriminator.

				##### Semantics:

				The '``llvm.ptrauth.sign_generic``' intrinsic computes the signature of a given
				combination of value and additional diversity data.

				It returns a full signature value (as opposed to a signed pointer value, with
				an embedded partial signature).

				As opposed to [``llvm.ptrauth.sign``](#llvm-ptrauth-sign), it does not interpret
				``value`` as a pointer value. Instead, it is an arbitrary data value.


				#### '``llvm.ptrauth.blend``'

				##### Syntax:

				```llvm
				declare i64 @llvm.ptrauth.blend(i64 <address discriminator>, i64 <integer discriminator>)
				```

				##### Overview:

				The '``llvm.ptrauth.blend``' intrinsic blends a pointer address discriminator
				with a small integer discriminator to produce a new "blended" discriminator.
				kristof.beylsUnsubmitted Done Reply Inline Actions It seems counter-intuitive to me that resign would return an invalid poison pointer on invalid signature, whereas llvm.ptrauth.auth triggers undefined behaviour on invalid signature on the input signed pointer. Wouldn't it be better to make the behavior consistent for both intrinsics when the signature of the signed pointer is invalid? kristof.beyls: It seems counter-intuitive to me that resign would return an invalid poison pointer on invalid…
				abAuthorUnsubmitted Done Reply Inline Actions It's indeed consistent, the doc is wrong ;) I will update the resign paragraph to match whatever we settle on for `auth` ab: It's indeed consistent, the doc is wrong ;) I will update the resign paragraph to match…

				##### Arguments:

				The ``address discriminator`` argument is a pointer value.
				The ``integer discriminator`` argument is a small integer, as specified by the
				target.

				##### Semantics:

				The '``llvm.ptrauth.blend``' intrinsic combines a small integer discriminator
				with a pointer address discriminator, in a way that is specified by the target
				implementation.


				## AArch64 Support

				AArch64 is currently the only architecture with full support of the pointer
				authentication primitives, based on Armv8.3-A instructions.

				### Armv8.3-A PAuth Pointer Authentication Code

				The Armv8.3-A architecture extension defines the PAuth feature, which provides
				support for instructions that manipulate Pointer Authentication Codes (PAC).

				#### Keys

				5 keys are supported by the PAuth feature.

				Of those, 4 keys are interchangeably usable to specify the key used in IR
				constructs:
				* ``ASIA``/``ASIB`` are instruction keys (encoded as respectively 0 and 1).
				* ``ASDA``/``ASDB`` are data keys (encoded as respectively 2 and 3).

				``ASGA`` is a special key that cannot be explicitly specified, and is only ever
				used implicitly, to implement the
				[``llvm.ptrauth.sign_generic``](#llvm-ptrauth-sign-generic) intrinsic.

				#### Instructions

				The IR [Intrinsics](#intrinsics) described above map onto these
				instructions as such:
				* [``llvm.ptrauth.sign``](#llvm-ptrauth-sign): ``PAC{I,D}{A,B}{Z,SP,}``
				* [``llvm.ptrauth.auth``](#llvm-ptrauth-auth): ``AUT{I,D}{A,B}{Z,SP,}``
				* [``llvm.ptrauth.strip``](#llvm-ptrauth-strip): ``XPAC{I,D}``
				* [``llvm.ptrauth.blend``](#llvm-ptrauth-blend): The semantics of the blend
				operation are specified by the ABI. In both the ELF PAuth ABI Extension and
				arm64e, it's a ``MOVK`` into the high 16 bits. Consequently, this limits
				the width of the integer discriminator used in blends to 16 bits.
				* [``llvm.ptrauth.sign_generic``](#llvm-ptrauth-sign-generic): ``PACGA``
				* [``llvm.ptrauth.resign``](#llvm-ptrauth-resign): ``AUT+PAC``. These are
				represented as a single pseudo-instruction in the backend to guarantee that
				the intermediate raw pointer value is not spilled and attackable.
				pbarrioUnsubmitted Done Reply Inline Actions Nit: replace ARMv8.3 by Armv8.3-A to follow Arm's trademark guidelines: https://www.arm.com/company/policies/trademarks/arm-trademark-list/arm-trademark. There are a few more occurrences of this in the patch. pbarrio: Nit: replace ARMv8.3 by Armv8.3-A to follow Arm's trademark guidelines: https://www.arm.
				pbarrioUnsubmitted Done Reply Inline Actions What does target implementation mean here? is it the target triple combination? I assume something like this but then the next paragraph says "AArch64 is currently the only target...". Maybe worth defining what we mean by target, or replace the next paragraph's "target" by "architecture". pbarrio: What does target implementation mean here? is it the target triple combination? I assume…

llvm/docs/Reference.rst

Show All 28 Lines	.. toctree::
HowToUseAttributes		HowToUseAttributes
InAlloca		InAlloca
LangRef		LangRef
LibFuzzer		LibFuzzer
MarkedUpDisassembly		MarkedUpDisassembly
MIRLangRef		MIRLangRef
OptBisect		OptBisect
PDB/index		PDB/index
		PointerAuth
ScudoHardenedAllocator		ScudoHardenedAllocator
MemTagSanitizer		MemTagSanitizer
Security		Security
SegmentedStacks		SegmentedStacks
StackMaps		StackMaps
SpeculativeLoadHardening		SpeculativeLoadHardening
Statepoints		Statepoints
SystemLibrary		SystemLibrary
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines

:doc:`StackMaps`		:doc:`StackMaps`
LLVM support for mapping instruction addresses to the location of		LLVM support for mapping instruction addresses to the location of
values and allowing code to be patched.		values and allowing code to be patched.

:doc:`Coroutines`		:doc:`Coroutines`
LLVM support for coroutines.		LLVM support for coroutines.

		:doc:`PointerAuth`
		A description of pointer authentication, its LLVM IR representation, and its
		support in the backend.

:doc:`YamlIO`		:doc:`YamlIO`
A reference guide for using LLVM's YAML I/O library.		A reference guide for using LLVM's YAML I/O library.

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 1,844 Lines • ▼ Show 20 Lines

	//===---------- Named shufflevector intrinsics ------===//			//===---------- Named shufflevector intrinsics ------===//
	def int_experimental_vector_splice : DefaultAttrsIntrinsic<[llvm_anyvector_ty],			def int_experimental_vector_splice : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
	[LLVMMatchType<0>,			[LLVMMatchType<0>,
	LLVMMatchType<0>,			LLVMMatchType<0>,
	llvm_i32_ty],			llvm_i32_ty],
	[IntrNoMem, ImmArg<ArgIndex<2>>]>;			[IntrNoMem, ImmArg<ArgIndex<2>>]>;


				//===----------------- Pointer Authentication Intrinsics ------------------===//
				//

				// Sign an unauthenticated pointer using the specified key and discriminator,
				// passed in that order.
				// Returns the first argument, with some known bits replaced with a signature.
				def int_ptrauth_sign : Intrinsic<[llvm_i64_ty],
				[llvm_i64_ty, llvm_i32_ty, llvm_i64_ty],
				[IntrNoMem, ImmArg<ArgIndex<1>>]>;

				// Authenticate a signed pointer, using the specified key and discriminator.
				// Returns the first argument, with the signature bits removed.
				// The signature must be valid.
				def int_ptrauth_auth : Intrinsic<[llvm_i64_ty],
				[llvm_i64_ty, llvm_i32_ty, llvm_i64_ty],
				[IntrNoMem,ImmArg<ArgIndex<1>>]>;

				// Authenticate a signed pointer and resign it.
				// The second (key) and third (discriminator) arguments specify the signing
				// schema used for authenticating.
				// The fourth and fifth arguments specify the schema used for signing.
				// The signature must be valid.
				// This is a combined form of @llvm.ptrauth.sign and @llvm.ptrauth.auth, with
				// an additional integrity guarantee on the intermediate value.
				def int_ptrauth_resign : Intrinsic<[llvm_i64_ty],
				[llvm_i64_ty, llvm_i32_ty, llvm_i64_ty,
				llvm_i32_ty, llvm_i64_ty],
				[IntrNoMem, ImmArg<ArgIndex<1>>,
				ImmArg<ArgIndex<3>>]>;

				// Strip the embedded signature out of a signed pointer.
				// The second argument specifies the key.
				// This behaves like @llvm.ptrauth.auth, but doesn't require the signature to
				// be valid.
				def int_ptrauth_strip : Intrinsic<[llvm_i64_ty],
				[llvm_i64_ty, llvm_i32_ty],
				[IntrNoMem, ImmArg<ArgIndex<1>>]>;

				// Blend a small integer discriminator with an address discriminator, producing
				// a new discriminator value.
				def int_ptrauth_blend : Intrinsic<[llvm_i64_ty],
				[llvm_i64_ty, llvm_i64_ty],
				[IntrNoMem]>;

				// Compute the signature of a value, using a given discriminator.
				// This differs from @llvm.ptrauth.sign in that it doesn't embed the computed
				// signature in the pointer, but instead returns the signature as a value.
				// That allows it to be used to sign non-pointer data: in that sense, it is
				// generic. There is no generic @llvm.ptrauth.auth: instead, the signature
				// can be computed using @llvm.ptrauth.sign_generic, and compared with icmp.
				def int_ptrauth_sign_generic : Intrinsic<[llvm_i64_ty],
				[llvm_i64_ty, llvm_i64_ty],
				[IntrNoMem]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Target-specific intrinsics			// Target-specific intrinsics
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	include "llvm/IR/IntrinsicsPowerPC.td"			include "llvm/IR/IntrinsicsPowerPC.td"
	include "llvm/IR/IntrinsicsX86.td"			include "llvm/IR/IntrinsicsX86.td"
	Show All 12 Lines