Introduce llvm.nospeculateload intrinsic
Needs ReviewPublic

Authored by kristof.beyls on Jan 5 2018, 3:07 AM.

Details

Summary

Recently, Google Project Zero disclosed several classes of attack
against speculative execution. One of these, known as variant-1
(CVE-2017-5753), allows explicit bounds checks to be bypassed under
speculation, providing an arbitrary read gadget. Further details can be
found on the GPZ blog [1].

This patch introduces a new LLVM-IR intrinsic, called
llvm.nospeculateload, which enables the implementation of the new
clang-level builtin __builtin_load_no_speculate, see review
https://reviews.llvm.org/D41760.

This new intrinsic provides a mechanism for limiting speculation by a
CPU after a bounds-checked memory access. We've tried to design this in
such a way that it can be used for any target where this might be
necessary. The patch consists of both target-specific functionality
for Arm and AArch64 code generation, and target-independent
functionality that other targets can reuse.

[1] More information on the topic can be found here:
https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
Arm specific information can be found here:
https://www.arm.com/security-update

Diff Detail

kristof.beyls created this revision.Jan 5 2018, 3:07 AM

Just as an FYI, we have been experimenting with similar APIs ourselves. We developed two candidate alternative APIs that, IMO, seem substantially better than this.

Sadly, we've just not had the time to prepare them for publication (in part due to the unexpected early disclosure). At least on x86, these are likely to be significantly cheaper. I suspect the same will be true on ARM. They will also likely be significantly easier for deployment in our experience auditing a few quite large systems where this is relevant.

I don't really want to hold this up if it needs to land very quickly though. Folks on our end will be working Very Rapidly on at least sharing the design we have in mind if not a complete implementation. Hopefully early next week, but we're still playing a bit of catch-up....

emaste added a subscriber: emaste.Jan 5 2018, 4:01 AM
reames added a subscriber: reames.Jan 5 2018, 11:45 AM

A design variation on this which may be worth considering is to phrase this as a speculative use barrier. That is, don't include the load at all, simply provide an intrinsic which guarantees that the result of the load (or any other instruction) will not be consumed by a speculative use with potential side-channel visible side effects.

i.e. restructure the intrinsic with the following form:
declare T @llvm.nospeculate(T %value)
declare T @llvm.nospeculate(T %value, i1 %spec_condition)

(The later variant is for when the problematic condition to speculate is known, but this has unresolved design challenges around CSE of conditions. TBD)

An example using the former would be:
%val = load i32, i32* %potentially_out_bounds
%val.forced = call T @llvm.nospeculate(T %val)
use %val.forced

I'm still thinking through what this would lower to on x86, but I think we can find cheapish instruction sequences which force the first load to retire before the use or treat this as scheduling constraint.

Thanks for the feedback, Chandler and Philip!

Please let me explain how we've ended up with the API design as proposed.

We started from the observation that:

  1. We need an API/intrinsic that can be implemented well in all major compilers, including gcc and clang/llvm.
  2. We need an API/intrinsic that works well across architectures.

For Arm, the recommended mitigation for protecting against Spectre variant 1 is to generate a LOAD->CSEL->CSDB instruction sequence, where CSDB is a new barrier instruction.
This sequence gives protection on all Arm implementations.
This is explained in far more detail at https://developer.arm.com/support/security-update/download-the-whitepaper, in section "Software mitigations", pages 4-8.

The need to generate the full LOAD->CSEL->CSDB sequence explains why the proposed intrinsic contains the semantics of loading a value, providing it is within bounds.
Being able to generate the LOAD->CSEL->CSDB sequence from the intrinsic is essential for AArch64 and ARM targets.

Hopefully that explains the needs for the ptr, lower_bound and upper_bound parameters.

The cmpptr and failval parameters are there to make it easier to use in certain scenarios, for example:

For failval, the idea is that for code like

if (ptr >= base && ptr < limit) // bounds check
  return *ptr;
return FAILVAL;

to be able to be easily rewritten as

return __builtin_load_no_speculate (ptr, base, limit, FAILVAL);

The cmpptr parameter was introduced after hearing a need from the linux kernel. They have some cases where cmpptr may be a pointer to an atomic type and want to do something like

if (cmpptr >= lower && cmpptr < upper)
  val = __atomic_read_and_inc (cmpptr);

By separating out cmpptr from ptr you can now write the following, which removes the need to try and wrestle "no-speculate" semantics into the atomic builtins:

if (cmpptr >= lower && cmpptr < upper)
  {
    T tmp_val = __atomic_read_and_inc (cmpptr);
    val = builtin_load_no_speculate (&tmp_val, lower, upper, 0,
                                     cmpptr);
  }

There is a bit more explanation on the rationale for the failval and cmpptr parameters at https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00594.html.

Furthermore, there are a few more details on the use of this intrinsic at https://developer.arm.com/support/security-update/compiler-support-for-mitigations

I hope the above helps to explain the rationale for the proposed API design?

The API design has been discussed over the past weeks in detail on the gcc mailing list. As a result of that, we propose to adapt the API, to enable efficient code generation also on architectures that need to generate a barrier instruction to achieve the desired semantics.

The main change in the proposed API is to drop the failval parameter and to tweak the semantics to the below.
There is a more detailed rationale for these changes at https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01546.html

I haven't updated the code to implement the new specification yet, but thought I'd share the new specification as soon as possible, while I find the time to adapt the implementation:

This is an overloaded intrinsic. You can use llvm.speculationsafeload on any integer type that can legally be loaded on the target, and any pointer type. However, not all targets support this intrinsic at the moment.

declare T @llvm.speculationsafeload.T(T* %ptr, i8* %lower_bound, i8* %upper_bound, i8* %cmpptr)

Arguments:
""""""""""

The first argument is a pointer, the second is a pointer used as a lower bound, the third is a pointer used as an upper bound. The fourth argument is a pointer.

Semantics:
""""""""""

  • When the intrinsic is not being executed speculatively:
    • if %lower_bound <= %cmpptr < %upper_bound, the value at address %ptr is returned.
    • if %cmpptr is not within these bounds, the behaviour is undefined.
  • When the intrinsic is being executed speculatively, either:
    • Execution of instructions following the intrinsic that have a dependency on the result of the intrinsic will be blocked, until the intrinsic is no longer executing speculatively. At this point, the semantics under point 1 above apply.
    • Speculation may continue using the value at address %ptr as the return value of the intrinsic, if %lower_bound <= %cmpptr < %upper_bound, or an unspecified constant value if %cmpptr is outside these bounds.

It would be awesome to have static analysis rules to help identify *where* to put these intrinsics. Is somebody working on that? Or did I miss it?

"When the intrinsic is being executed speculatively"

What does this mean?

LLVM IR defines the semantics of a program only in terms of visible side-effects. It does not define any semantics for code which does not execute, and it does not provide any guarantees in terms of what data can be leaked via side-channels. If you're going to attach meaningful semantics to speculationsafeload, you also need to generally define what code the compiler is allowed to emit in blocks which could be speculatively executed. As an extreme example, there needs to be some rule which prevents the compiler from inserting an attackable load immediately before a call to speculationsafeload.

"When the intrinsic is being executed speculatively"

What does this mean?

LLVM IR defines the semantics of a program only in terms of visible side-effects. It does not define any semantics for code which does not execute, and it does not provide any guarantees in terms of what data can be leaked via side-channels. If you're going to attach meaningful semantics to speculationsafeload, you also need to generally define what code the compiler is allowed to emit in blocks which could be speculatively executed. As an extreme example, there needs to be some rule which prevents the compiler from inserting an attackable load immediately before a call to speculationsafeload.

Hi Eli,

Thanks for your feedback and insightful comments.

Regarding "visible side-effects", I think one could argue that the speculative execution of a load can lead to a visible side-effect, as demonstrated by Spectre, even though LLVM IR doesn't define the concept of speculative execution.

Whether code that executes on a CPU due to mis-speculation is "code that executes" or "code that does not execute" depends on the point of view, IMHO.
From the actual micro-architectural execution on the CPU point-of-view, it does execute, even if it is speculatively.
From the model of the execution that LLVM (and probably most other compilers) use, it assumes that execution doesn't happen.
I wouldn't say that code that is executed due to mis-speculation "doesn't happen", but rather that LLVM-IR's model isn't (currently?) capable of representing that.

Should LLVM-IR be extended to be able to represent (some forms of) mis-speculation at some point? I honestly don't know if there would be much value in that.

All of the above being said, with this intrinsic, we aim to make it possible for software developers to protect specific pieces of source code against Spectre variant 1 attacks, and hence would like to get it into both clang and gcc soon.

I do think your point around "speculative execution" not being modelled in LLVM-IR is valid. Maybe the semantics of this intrinsic could be specified avoiding the use of the concept "speculative execution" as much as possible.
It might not be possible to avoid it completely as protection against side channel leaks happening during mis-speculation is the raison d'etre for this intrinsic.
Anyway if that would at least address one of your concerns, I'll look into that.

I believe another concern you raise is around what code transformations compilers may do that could render the protection of such an intrinsic less effective?
Did you have any specific example in mind?
In the implementation, we give the compiler very little knowledge of exactly what is loaded through the intrinsic, to reduce the probability that it would itself introduce non-protected loads to the same location or derived from it.
I'm wondering if you had been thinking of a potential example where this wouldn't be enough?

Thanks!

Kristof

It would be awesome to have static analysis rules to help identify *where* to put these intrinsics. Is somebody working on that? Or did I miss it?

Indeed it would be awesome to have static analysis to help identify where these intrinsics should be put. The (non-public) experiments I've seen seem to indicate that it's hard to get a low false positive rate, probably at least in part due to static analyzers not getting information on which variables are influenceable by external input, and which ones not.

I don't think it's likely the compiler would intentionally introduce a load using the same pointer as the operand to a speculationsafeload; given most transforms don't understand what speculationsafeload does, the compiler has no reason to introduce code like that (even if it isn't technically prohibited by LangRef).

More practically, I'm worried about the possibility that code which doesn't appear to be vulnerable at the source-code level will get lowered to assembly which is vulnerable. For example, the compiler could produce code where the CPU speculates a load from an uninitialized pointer value. Without an IR/MIR model for speculation, we have no way to prevent this from happening.

Adding a few folks on my team who will hopefully take over helping here (my time has vanished). Note that they aren't on llvm-commits, so please reply-to-all etc.

Also, for those joining, a link to the mailing list archive:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180115/516684.html

(I hope you can bear with me while I come up to speed; I understand there's some time pressure here, so I'm erring on the side of speaking sooner)

The proposed API for nospeculateload seems like it could be problematic for programmers, because it gives them no way to tell whether the bounds check passed, unless they are able to identify a failval sentinel that can never be the result of a successful load (and yet is safe to use as the speculative result of the load). Thus, it seems likely that in a lot of cases, the bounds check will be duplicated: once inside nospeculateload, and once in user code. That will make code using this API "feel" inefficient, which will tend to discourage programmers from using it. Furthermore, if it actually is inefficient, that will create pressure for optimization passes to "peek inside" nospeculateload in order to eliminate the duplication, and that seems like a can of worms we really don't want to open.

Another way of putting it is that we probably want this API to be as primitive as possible, because the more logic we put inside the intrinsic, the greater the risk that some parts of it will be unsuited for some users. Consequently, we've been experimenting with APIs that are concerned solely with the bounds check, rather than with the subsequent load:

int64_t SecureBoundedOffset(int64_t offset, int64_t bound);

At the abstract machine level, this function just returns offset, and has the precondition that 0 <= offset < bound, but it also ensures that for speculative executions, offset will always appear to be within those bounds. There are also variants for uint64_t, and variants that take the base-2 log of the bound, for greater efficiency when the bound is a power of two.

template <typename T, typename... ZeroArgs>
bool IsPointerInRange(T*& pointer, T* begin, T* end, ZeroArgs... args);

This function returns whether pointer is between begin and end, and also guarantees that if the function returns false, then any speculative execution which assumes it to be true will treat pointer and args... as zero (all ZeroArgs must be integers or pointers). Notice that this API allows the optimizer to hoist loads past the branch, so long as the loads don't depend on pointer or args...; I'm not sure if that's true of nospeculateload or SecureBoundedOffset.

Notice that by not handling the load itself, these APIs avoid the ptr/cmpptr awkwardness, as well as the need for the user to designate a sentinel value. One tradeoff is that whereas nospeculateload can be thought of as a conditional load, plus some "security magic", these APIs are harder to understand without explicitly thinking about speculative execution. However, I'm not sure that's much of a problem in practice-- if you don't want to think about speculative execution, you probably shouldn't be using this API in the first place.

Is there any way we could implement an interface like those efficiently on ARM?

I don't think it's likely the compiler would intentionally introduce a load using the same pointer as the operand to a speculationsafeload; given most transforms don't understand what speculationsafeload does, the compiler has no reason to introduce code like that (even if it isn't technically prohibited by LangRef).

Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what speculationsafeload does?

More practically, I'm worried about the possibility that code which doesn't appear to be vulnerable at the source-code level will get lowered to assembly which is vulnerable. For example, the compiler could produce code where the CPU speculates a load from an uninitialized pointer value. Without an IR/MIR model for speculation, we have no way to prevent this from happening.

When you say "code which doesn't appear to be vulnerable at the source-code level", do you mean "code that is explicitly protected by speculationsafeload", or "code that doesn't appear to need speculationsafeload in the first place"? If the former, could you give a more concrete example?

It seems to me that we ought to be able to specify this intrinsic without having an explicit model of CPU speculation in the IR, because at the IR level we can just constrain the types of transformations that can be performed on this intrinsic. That way we only need to talk about speculation when we're specifying how this intrinsic is used to generate code for a specific platform that happens to feature CPU-level branch speculation. Very tentatively, I think the specific constraints on transformations that are needed here are that the intrinsic has unmodeled side-effects (so it can't be eliminated or executed speculatively), and that it should be treated as writing to all memory (or only to pointer and args.. in the case of an API like IsPointerInRange).

template <typename T, typename... ZeroArgs>
bool IsPointerInRange(T*& pointer, T* begin, T* end, ZeroArgs... args);

This function returns whether pointer is between begin and end, and also guarantees that if the function returns false, then any speculative execution which assumes it to be true will treat pointer and args... as zero (all ZeroArgs must be integers or pointers). Notice that this API allows the optimizer to hoist loads past the branch, so long as the loads don't depend on pointer or args...; I'm not sure if that's true of nospeculateload or SecureBoundedOffset.

Chandler points out offline that this can work for any predicate, not just bound checking, so perhaps the API could instead be something like:

template <typename... ZeroArgs>
bool ProtectFromSpeculation(bool predicate, ZeroArgs&... args);

with the semantics that if predicate is false, speculative execution that treats it as true will also treat args... as zero. This has the arguable problem that, like SecureBoundedOffset, it's a no-op as far as the C++ abstract machine is concerned, so it can't really be explained without talking about speculation, but it's quite simple and general, and I'm not at all convinced that the connection of this API to speculative execution is something we should hide.

Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what speculationsafeload does?

Prohibit what, exactly? According to current LangRef rules, it's legal to introduce a dead load to an arbitrary pointer (even if the compiler can't prove it's dereferencable).

do you mean "code that is explicitly protected by speculationsafeload", or "code that doesn't appear to need speculationsafeload in the first place"?

Both, I guess?

I guess I'll describe the uninitialized pointer problem in a little more detail. The idea is that you have code roughly like this:

// Code in a function; g is a global int.
bool b = f1();
uint8_t* p;
if (b)
    p = &g;
g = 10;
if (b)
    f4(user_array[*p]);

If b is true, we load user_array[10]. If b is false, we speculate user_array[*p], where p is uninitialized (i.e. user-controlled, if you're unlucky). You now have a variant-1 attack.

There's a lot of potential variants of this. For example, instead of user_array, we have another speculatively-uninitialized pointer. Or the load which leaks the data to the user could actually be a speculationsafeload, intended to stop a different variant-1 attack. Or the code could be spread over multiple functions. Or the if statement might not be an if statement (there are lots of ways to get a conditional branch in assembly). Or "p" might be a pointer to a constant pool, so the load-from-undef isn't written in the source code at all.

Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what speculationsafeload does?

Prohibit what, exactly? According to current LangRef rules, it's legal to introduce a dead load to an arbitrary pointer (even if the compiler can't prove it's dereferencable).

I'm basically just asking if we should have some form of assurance stronger than "most transforms don't understand this intrinsic well enough to violate the invariants it relies on". I have no idea what form that assurance would take, since I don't know how LangRef handles such matters.

I guess I'll describe the uninitialized pointer problem in a little more detail. The idea is that you have code roughly like this:

// Code in a function; g is a global int.
bool b = f1();
uint8_t* p;
if (b)
    p = &g;
g = 10;
if (b)
    f4(user_array[*p]);

If b is true, we load user_array[10]. If b is false, we speculate user_array[*p], where p is uninitialized (i.e. user-controlled, if you're unlucky). You now have a variant-1 attack.

Hmm. If I understand correctly, the ProtectFromSpeculation() API I mentioned earlier could guard against this, by including p in the variadic list of clobbers.

There's a lot of potential variants of this. For example, instead of user_array, we have another speculatively-uninitialized pointer. Or the load which leaks the data to the user could actually be a speculationsafeload, intended to stop a different variant-1 attack. Or the code could be spread over multiple functions. Or the if statement might not be an if statement (there are lots of ways to get a conditional branch in assembly). Or "p" might be a pointer to a constant pool, so the load-from-undef isn't written in the source code at all.

How can two variant-1 attacks be "different" enough that a speculationsafeload would protect against one but not the other, when both exploit the same load operation? The only possibility I'm coming up with is that the load has additional safety requirements besides the speculationsafeload bounds check. If so, that might argue for an API that lets you express arbitrary predicates (like ProtectFromSpeculation), rather than just upper/lower bounds checks.

I don't see how the code being spread over multiple functions matters- all that matters are the load, and the branch (or nested branches) that actually guard that load, not any prior branches on the same condition. As for the if-statement not being an if-statement, that's true to the extent that in principle it could be a switch or a loop, but it has to be some kind of conditional control flow that's explicit in user code. This attack is only a platform-level issue because the attacker is exploiting features of the platform to observe the effects of a load that the application-level logic says cannot happen. If the application logic doesn't explicitly prevent any of the loads the attacker is exploiting, that's a plain old application-level vulnerability that LLVM neither can nor should fix.

So it seems to me that all the variations you identify are either application-level vulnerabilities, or can be straightforwardly blocked by something like ProtectFromSpeculation (I'm not quite sure about speculationsafeload).

I have no idea what form that assurance would take, since I don't know how LangRef handles such matters.

Well, I don't really know either; LangRef only describes the abstract virtual machine, mostly. That's part of the problem. :)

How can two variant-1 attacks be "different" enough that a speculationsafeload would protect against one but not the other, when both exploit the same load operation

Sorry, wasn't quite clear. There are two speculated loads for a variant-1 attack: the load that reads the secret, and the load that leaks the secret to the user. The first load has to be speculation-safe to stop the attack; whether the second is speculation-safe is irrelevant, at least in the proposals so far. That isn't really a fundamental problem, just an illustration that reasoning about these attacks is tricky.

I don't see how the code being spread over multiple functions matters- all that matters are the load, and the branch (or nested branches) that actually guard that load

Well, the CPU doesn't care (assuming it can perfectly predict calls), but it's problematic from an auditing perspective because it's harder to spot. Particularly since the vulnerable code might not explicitly reference any user-controlled data at all.

If the application logic doesn't explicitly prevent any of the loads the attacker is exploiting

A C programmer cannot reasonably come up with a complete list of all the potentially exploitable loads without the compiler being aware that the user needs Spectre-resistant code. There are two key problems:

  1. An exploitable load might not exist in the original program. One example is the switch-to-lookup-table transform. Given:
int a(unsigned x) {
  switch (x) {
  case 0: return 2;
  case 1: return 44;
  case 2: return 23;
  default: return 8;
  }
}

We transform to:

int a(int x) {
  if (x > 2)
    return 8;
  const static int table[] = {2, 44, 23};
  return table[x];
}

Now you have a speculated out-of-bounds load from code which didn't contain any loads.

  1. The exploitable code might come out of some non-obvious lowering. Even if a pointer points to something "obviously" safe, it might actually be uninitialized along some impossible path through the function. Probably the easiest example to explain is polly's invariant load hoisting. Basically, if you have a loop like this:
int sum = 0;
for (int i = 0; i < n; ++i) {
  if (b)
    sum += (*p)[i];
}

It gets transformed to something like this:

int *pp;
if (b) pp = *p;
int sum = 0;
for (int i = 0; i < n; ++i) {
  if (b)
    sum += pp[i];
}

So now you have the if() if() pattern which leads to a speculated load from an uninitialized pointer.

How can two variant-1 attacks be "different" enough that a speculationsafeload would protect against one but not the other, when both exploit the same load operation

Sorry, wasn't quite clear. There are two speculated loads for a variant-1 attack: the load that reads the secret, and the load that leaks the secret to the user. The first load has to be speculation-safe to stop the attack; whether the second is speculation-safe is irrelevant, at least in the proposals so far. That isn't really a fundamental problem, just an illustration that reasoning about these attacks is tricky.

Wait, no, this isn't right. It is actually possible to attack a speculationsafeload, at least in theory.

In general, the logic behind the "soft" speculation barriers being proposed is that they block execution if a speculation check fails. But given that the speculation check is itself computed speculatively, you have to ensure that not only is the result correct in the normal sense, but also that it's correct when computed speculatively. And as I showed earlier, IR transforms can turn an arbitrary value into a speculatively-uninitialized value. Therefore, the speculation check can spuriously succeed, and the speculation barrier doesn't reliably prevent speculation.

(I hope you can bear with me while I come up to speed; I understand there's some time pressure here, so I'm erring on the side of speaking sooner)

The proposed API for nospeculateload seems like it could be problematic for programmers, because it gives them no way to tell whether the bounds check passed, unless they are able to identify a failval sentinel that can never be the result of a successful load (and yet is safe to use as the speculative result of the load). Thus, it seems likely that in a lot of cases, the bounds check will be duplicated: once inside nospeculateload, and once in user code. That will make code using this API "feel" inefficient, which will tend to discourage programmers from using it. Furthermore, if it actually is inefficient, that will create pressure for optimization passes to "peek inside" nospeculateload in order to eliminate the duplication, and that seems like a can of worms we really don't want to open.

Another way of putting it is that we probably want this API to be as primitive as possible, because the more logic we put inside the intrinsic, the greater the risk that some parts of it will be unsuited for some users. Consequently, we've been experimenting with APIs that are concerned solely with the bounds check, rather than with the subsequent load:

int64_t SecureBoundedOffset(int64_t offset, int64_t bound);

At the abstract machine level, this function just returns offset, and has the precondition that 0 <= offset < bound, but it also ensures that for speculative executions, offset will always appear to be within those bounds. There are also variants for uint64_t, and variants that take the base-2 log of the bound, for greater efficiency when the bound is a power of two.

template <typename T, typename... ZeroArgs>
bool IsPointerInRange(T*& pointer, T* begin, T* end, ZeroArgs... args);

This function returns whether pointer is between begin and end, and also guarantees that if the function returns false, then any speculative execution which assumes it to be true will treat pointer and args... as zero (all ZeroArgs must be integers or pointers). Notice that this API allows the optimizer to hoist loads past the branch, so long as the loads don't depend on pointer or args...; I'm not sure if that's true of nospeculateload or SecureBoundedOffset.

Notice that by not handling the load itself, these APIs avoid the ptr/cmpptr awkwardness, as well as the need for the user to designate a sentinel value. One tradeoff is that whereas nospeculateload can be thought of as a conditional load, plus some "security magic", these APIs are harder to understand without explicitly thinking about speculative execution. However, I'm not sure that's much of a problem in practice-- if you don't want to think about speculative execution, you probably shouldn't be using this API in the first place.

Is there any way we could implement an interface like those efficiently on ARM?

I don't think it's likely the compiler would intentionally introduce a load using the same pointer as the operand to a speculationsafeload; given most transforms don't understand what speculationsafeload does, the compiler has no reason to introduce code like that (even if it isn't technically prohibited by LangRef).

Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what speculationsafeload does?

More practically, I'm worried about the possibility that code which doesn't appear to be vulnerable at the source-code level will get lowered to assembly which is vulnerable. For example, the compiler could produce code where the CPU speculates a load from an uninitialized pointer value. Without an IR/MIR model for speculation, we have no way to prevent this from happening.

When you say "code which doesn't appear to be vulnerable at the source-code level", do you mean "code that is explicitly protected by speculationsafeload", or "code that doesn't appear to need speculationsafeload in the first place"? If the former, could you give a more concrete example?

It seems to me that we ought to be able to specify this intrinsic without having an explicit model of CPU speculation in the IR, because at the IR level we can just constrain the types of transformations that can be performed on this intrinsic. That way we only need to talk about speculation when we're specifying how this intrinsic is used to generate code for a specific platform that happens to feature CPU-level branch speculation. Very tentatively, I think the specific constraints on transformations that are needed here are that the intrinsic has unmodeled side-effects (so it can't be eliminated or executed speculatively), and that it should be treated as writing to all memory (or only to pointer and args.. in the case of an API like IsPointerInRange).

Thanks very much for sharing this, Geoff!
I have a few immediate questions/thoughts, and wonder what you and others here think about them:

  1. Lowering to various instruction sets.

I think one of the questions to ask is whether the APIs here can be lowered well to different instruction sets.
I believe that may be possible for Arm, but I'm still looking into it. It would be useful for us to have a few examples of how these APIs are envisioned to be used in practice, to make sure we understand the proposal well enough. E.g. maybe a few examples in the same spirit as under “More complex cases” at https://developer.arm.com/support/security-update/compiler-support-for-mitigations? Do you happen to have a suggestion of how this intrinsic would best be lowered on some instruction sets other than Arm? There was quite a bit of debate about the ability to efficiently lower the intrinsic in our proposal on the gcc mailing list, e.g. see https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01136.html.

  1. Is the API available in C too?

IIUC, the SecureBoundOffset intrinsic is usable from both C and C++, but the IsPointerInRange intrinsic can only be used from C++? Do you have ideas around bringing similar functionality to C?

  1. For the variant with a general predicate (bool ProtectFromSpeculation(bool predicate, ZeroArgs&... args);); do you have ideas about how to make sure that the optimizers in the compiler don’t simplify the predicate too much?

For example:

if (a>7) {
  x = a;
  if (ProtectFromSpeculation(a>5, x) {
     ... = v[x];
     ...
  }
}

how to prevent this from being optimized to:

if (a>7) {
  x = a;
  if (ProtectFromSpeculation(true, x) {
    ... = v[x];
    ...
  }
}

which leads to no longer giving protection.

I have no idea what form that assurance would take, since I don't know how LangRef handles such matters.

Well, I don't really know either; LangRef only describes the abstract virtual machine, mostly. That's part of the problem. :)

How can two variant-1 attacks be "different" enough that a speculationsafeload would protect against one but not the other, when both exploit the same load operation

Sorry, wasn't quite clear. There are two speculated loads for a variant-1 attack: the load that reads the secret, and the load that leaks the secret to the user. The first load has to be speculation-safe to stop the attack; whether the second is speculation-safe is irrelevant, at least in the proposals so far. That isn't really a fundamental problem, just an illustration that reasoning about these attacks is tricky.

I don't see how the code being spread over multiple functions matters- all that matters are the load, and the branch (or nested branches) that actually guard that load

Well, the CPU doesn't care (assuming it can perfectly predict calls), but it's problematic from an auditing perspective because it's harder to spot. Particularly since the vulnerable code might not explicitly reference any user-controlled data at all.

If the application logic doesn't explicitly prevent any of the loads the attacker is exploiting

A C programmer cannot reasonably come up with a complete list of all the potentially exploitable loads without the compiler being aware that the user needs Spectre-resistant code. There are two key problems:

  1. An exploitable load might not exist in the original program. One example is the switch-to-lookup-table transform. Given:

    ` int a(unsigned x) { switch (x) { case 0: return 2; case 1: return 44; case 2: return 23; default: return 8; } } `

    We transform to: ` int a(int x) { if (x > 2) return 8; const static int table[] = {2, 44, 23}; return table[x]; } `

    Now you have a speculated out-of-bounds load from code which didn't contain any loads.
  2. The exploitable code might come out of some non-obvious lowering. Even if a pointer points to something "obviously" safe, it might actually be uninitialized along some impossible path through the function. Probably the easiest example to explain is polly's invariant load hoisting. Basically, if you have a loop like this:

    ` int sum = 0; for (int i = 0; i < n; ++i) { if (b) sum += (*p)[i]; } `

    It gets transformed to something like this:

    ` int *pp; if (b) pp = *p; int sum = 0; for (int i = 0; i < n; ++i) { if (b) sum += pp[i]; } `

    So now you have the if() if() pattern which leads to a speculated load from an uninitialized pointer.

Thanks for the clear examples, Eli.
I wonder if anyone on this thread already has thought about whether it would be practically possible to make those transformations not introduce such a pattern, e.g. under a specific code generation option? Or if a transformation introduces such a pattern, whether it would be feasible for it to make use of whatever the intrinsic is we end up with?

How can two variant-1 attacks be "different" enough that a speculationsafeload would protect against one but not the other, when both exploit the same load operation

Sorry, wasn't quite clear. There are two speculated loads for a variant-1 attack: the load that reads the secret, and the load that leaks the secret to the user. The first load has to be speculation-safe to stop the attack; whether the second is speculation-safe is irrelevant, at least in the proposals so far. That isn't really a fundamental problem, just an illustration that reasoning about these attacks is tricky.

Wait, no, this isn't right. It is actually possible to attack a speculationsafeload, at least in theory.

In general, the logic behind the "soft" speculation barriers being proposed is that they block execution if a speculation check fails. But given that the speculation check is itself computed speculatively, you have to ensure that not only is the result correct in the normal sense, but also that it's correct when computed speculatively. And as I showed earlier, IR transforms can turn an arbitrary value into a speculatively-uninitialized value. Therefore, the speculation check can spuriously succeed, and the speculation barrier doesn't reliably prevent speculation.

I've found the last sentence a bit cryptic. Even though I think I understand what you're pointing out here, I would like to clarify my understanding is correct with the following example (using the SecureBoundedOffset API @gromer proposed):

if (Wednesday) {
   limit = 2;
} else {
   limit = 3;
}

if (a < 0 || a >= limit)
   return;

a = SecureBoundedOffset(a, limit);
v = b[a];

With limit itself potentially having a value based on misspeculation (e.g. what happens when Wednesday is true, and a==2, and both if statements get mispredicted)?
I hope this is an example of what you were referring to @eli.friedman? Or did I misunderstand?
If so, I think there are ways to use the intrinsic multiple times to protect against this - but I'm not sure that's how you envision the use of the intrinsic, @gromer?

I hope this is an example of what you were referring to @eli.friedman? Or did I misunderstand?

That's the basic idea, yes. (Of course, like I mentioned earlier, there are a variety of ways to end up with assembly like even if there isn't an explicit if statment in the source code.)

(I hope you can bear with me while I come up to speed; I understand there's some time pressure here, so I'm erring on the side of speaking sooner)

The proposed API for nospeculateload seems like it could be problematic for programmers, because it gives them no way to tell whether the bounds check passed, unless they are able to identify a failval sentinel that can never be the result of a successful load (and yet is safe to use as the speculative result of the load). Thus, it seems likely that in a lot of cases, the bounds check will be duplicated: once inside nospeculateload, and once in user code. That will make code using this API "feel" inefficient, which will tend to discourage programmers from using it. Furthermore, if it actually is inefficient, that will create pressure for optimization passes to "peek inside" nospeculateload in order to eliminate the duplication, and that seems like a can of worms we really don't want to open.

Another way of putting it is that we probably want this API to be as primitive as possible, because the more logic we put inside the intrinsic, the greater the risk that some parts of it will be unsuited for some users. Consequently, we've been experimenting with APIs that are concerned solely with the bounds check, rather than with the subsequent load:

int64_t SecureBoundedOffset(int64_t offset, int64_t bound);

At the abstract machine level, this function just returns offset, and has the precondition that 0 <= offset < bound, but it also ensures that for speculative executions, offset will always appear to be within those bounds. There are also variants for uint64_t, and variants that take the base-2 log of the bound, for greater efficiency when the bound is a power of two.

template <typename T, typename... ZeroArgs>
bool IsPointerInRange(T*& pointer, T* begin, T* end, ZeroArgs... args);

This function returns whether pointer is between begin and end, and also guarantees that if the function returns false, then any speculative execution which assumes it to be true will treat pointer and args... as zero (all ZeroArgs must be integers or pointers). Notice that this API allows the optimizer to hoist loads past the branch, so long as the loads don't depend on pointer or args...; I'm not sure if that's true of nospeculateload or SecureBoundedOffset.

Notice that by not handling the load itself, these APIs avoid the ptr/cmpptr awkwardness, as well as the need for the user to designate a sentinel value. One tradeoff is that whereas nospeculateload can be thought of as a conditional load, plus some "security magic", these APIs are harder to understand without explicitly thinking about speculative execution. However, I'm not sure that's much of a problem in practice-- if you don't want to think about speculative execution, you probably shouldn't be using this API in the first place.

Is there any way we could implement an interface like those efficiently on ARM?

I don't think it's likely the compiler would intentionally introduce a load using the same pointer as the operand to a speculationsafeload; given most transforms don't understand what speculationsafeload does, the compiler has no reason to introduce code like that (even if it isn't technically prohibited by LangRef).

Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what speculationsafeload does?

More practically, I'm worried about the possibility that code which doesn't appear to be vulnerable at the source-code level will get lowered to assembly which is vulnerable. For example, the compiler could produce code where the CPU speculates a load from an uninitialized pointer value. Without an IR/MIR model for speculation, we have no way to prevent this from happening.

When you say "code which doesn't appear to be vulnerable at the source-code level", do you mean "code that is explicitly protected by speculationsafeload", or "code that doesn't appear to need speculationsafeload in the first place"? If the former, could you give a more concrete example?

It seems to me that we ought to be able to specify this intrinsic without having an explicit model of CPU speculation in the IR, because at the IR level we can just constrain the types of transformations that can be performed on this intrinsic. That way we only need to talk about speculation when we're specifying how this intrinsic is used to generate code for a specific platform that happens to feature CPU-level branch speculation. Very tentatively, I think the specific constraints on transformations that are needed here are that the intrinsic has unmodeled side-effects (so it can't be eliminated or executed speculatively), and that it should be treated as writing to all memory (or only to pointer and args.. in the case of an API like IsPointerInRange).

Thanks very much for sharing this, Geoff!
I have a few immediate questions/thoughts, and wonder what you and others here think about them:

  1. Lowering to various instruction sets.

    I think one of the questions to ask is whether the APIs here can be lowered well to different instruction sets. I believe that may be possible for Arm, but I'm still looking into it. It would be useful for us to have a few examples of how these APIs are envisioned to be used in practice, to make sure we understand the proposal well enough. E.g. maybe a few examples in the same spirit as under “More complex cases” at https://developer.arm.com/support/security-update/compiler-support-for-mitigations? Do you happen to have a suggestion of how this intrinsic would best be lowered on some instruction sets other than Arm? There was quite a bit of debate about the ability to efficiently lower the intrinsic in our proposal on the gcc mailing list, e.g. see https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01136.html.

Focusing on ProtectFromSpeculation API style, it can be lowered *very* efficiently on x86 at least:

  xor  %zero_reg, %zero_reg
  ...
  cmpq %a, 42
  jb below
  ja above
equal:
  cmovne %zero_reg, %arg0
  cmovne %zero_reg, %arg1
  ...
  cmovne %zero_reg, %argN
  ...

below:
  cmovnb %zero_reg, %arg0
  cmovnb %zero_reg, %arg1
  ...
  cmovnb %zero_reg, %argN
  ...

above:
  cmovna %zero_reg, %arg0
  cmovna %zero_reg, %arg1
  ...
  cmovna %zero_reg, %argN
  ...

Especially in the common case of 1 or 2 arguments needing to be zeroed this ends up being quite nice code generation using the existing structure of the conditional branch.

  1. Is the API available in C too?

    IIUC, the SecureBoundOffset intrinsic is usable from both C and C++, but the IsPointerInRange intrinsic can only be used from C++? Do you have ideas around bringing similar functionality to C?

Yeah, this is part of why I suggest the much more generic ProtectFromSpeculation API which I think is easily applicable in C. The C version might use pointers or whatever, but this kind of API doesn't fundamentally require any interesting lang

  1. For the variant with a general predicate (bool ProtectFromSpeculation(bool predicate, ZeroArgs&... args);); do you have ideas about how to make sure that the optimizers in the compiler don’t simplify the predicate too much? For example:

    ` if (a>7) { x = a; if (ProtectFromSpeculation(a>5, x) { ... = v[x]; ... } }

    ` how to prevent this from being optimized to: ` if (a>7) { x = a; if (ProtectFromSpeculation(true, x) { ... = v[x]; ... } } ` which leads to no longer giving protection.

No matter what, this will require deep compiler support to implement. Even without the example you give, these construct fundamentally violate the rules the optimizer uses: they are by definition no-ops for execution of the program!

This means we will have to work to build up specific an dedicated infrastructure in the compiler to model these as having special semantics. That exact infrastructure can provide whatever optimization barriers are necessary to get the desired behavior. For example, the code generation I suggest above for x86 cannot be implemented in LLVM using its IR alone (I've actually tried). We'll have to model this both in the IR and even in the code generator specially in order to produce the kind of specific pattern that is necessary.

But there is also the question of what burden do we want to place on the user of these intrinsics vs. what performance hit we're willing to accept due to optimization barriers. I could imagine two approaches here:

  1. It is the programmers responsibility to correctly protect any predicates that their application is sensitive to. As a consequence, if the a>5 predicate is sensitive for the application, so must the a>7 predicate be, and it is the programmers responsibility to protect both of them. This allows the implementation to have the minimal set of optimization barriers, but may make it difficult for programmers to use correctly.
  1. The predicate provided to these APIs is truly special and is forced to be a *dynamic* predicate. That is, we require the compiler to emit the predicate as if no preconditions existed. There are ways to model this in LLVM and I assume any compiler. As a trivial (but obviously bad) example: all references to variables within the predicate could be lowered by rinsing that SSA value through an opaque construct like inline asm.

There is clearly a "programmer ease / security" vs. "better optimization" tradeoff between the two. If one isn't *clearly* the correct choice in all cases, we could even expose both behind separate APIs that try to make it clear the extent of protections provided.

Does that make sense?

(I hope you can bear with me while I come up to speed; I understand there's some time pressure here, so I'm erring on the side of speaking sooner)

The proposed API for nospeculateload seems like it could be problematic for programmers, because it gives them no way to tell whether the bounds check passed, unless they are able to identify a failval sentinel that can never be the result of a successful load (and yet is safe to use as the speculative result of the load). Thus, it seems likely that in a lot of cases, the bounds check will be duplicated: once inside nospeculateload, and once in user code. That will make code using this API "feel" inefficient, which will tend to discourage programmers from using it. Furthermore, if it actually is inefficient, that will create pressure for optimization passes to "peek inside" nospeculateload in order to eliminate the duplication, and that seems like a can of worms we really don't want to open.

Another way of putting it is that we probably want this API to be as primitive as possible, because the more logic we put inside the intrinsic, the greater the risk that some parts of it will be unsuited for some users. Consequently, we've been experimenting with APIs that are concerned solely with the bounds check, rather than with the subsequent load:

int64_t SecureBoundedOffset(int64_t offset, int64_t bound);

At the abstract machine level, this function just returns offset, and has the precondition that 0 <= offset < bound, but it also ensures that for speculative executions, offset will always appear to be within those bounds. There are also variants for uint64_t, and variants that take the base-2 log of the bound, for greater efficiency when the bound is a power of two.

template <typename T, typename... ZeroArgs>
bool IsPointerInRange(T*& pointer, T* begin, T* end, ZeroArgs... args);

This function returns whether pointer is between begin and end, and also guarantees that if the function returns false, then any speculative execution which assumes it to be true will treat pointer and args... as zero (all ZeroArgs must be integers or pointers). Notice that this API allows the optimizer to hoist loads past the branch, so long as the loads don't depend on pointer or args...; I'm not sure if that's true of nospeculateload or SecureBoundedOffset.

Notice that by not handling the load itself, these APIs avoid the ptr/cmpptr awkwardness, as well as the need for the user to designate a sentinel value. One tradeoff is that whereas nospeculateload can be thought of as a conditional load, plus some "security magic", these APIs are harder to understand without explicitly thinking about speculative execution. However, I'm not sure that's much of a problem in practice-- if you don't want to think about speculative execution, you probably shouldn't be using this API in the first place.

Is there any way we could implement an interface like those efficiently on ARM?

I don't think it's likely the compiler would intentionally introduce a load using the same pointer as the operand to a speculationsafeload; given most transforms don't understand what speculationsafeload does, the compiler has no reason to introduce code like that (even if it isn't technically prohibited by LangRef).

Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what speculationsafeload does?

More practically, I'm worried about the possibility that code which doesn't appear to be vulnerable at the source-code level will get lowered to assembly which is vulnerable. For example, the compiler could produce code where the CPU speculates a load from an uninitialized pointer value. Without an IR/MIR model for speculation, we have no way to prevent this from happening.

When you say "code which doesn't appear to be vulnerable at the source-code level", do you mean "code that is explicitly protected by speculationsafeload", or "code that doesn't appear to need speculationsafeload in the first place"? If the former, could you give a more concrete example?

It seems to me that we ought to be able to specify this intrinsic without having an explicit model of CPU speculation in the IR, because at the IR level we can just constrain the types of transformations that can be performed on this intrinsic. That way we only need to talk about speculation when we're specifying how this intrinsic is used to generate code for a specific platform that happens to feature CPU-level branch speculation. Very tentatively, I think the specific constraints on transformations that are needed here are that the intrinsic has unmodeled side-effects (so it can't be eliminated or executed speculatively), and that it should be treated as writing to all memory (or only to pointer and args.. in the case of an API like IsPointerInRange).

Thanks very much for sharing this, Geoff!
I have a few immediate questions/thoughts, and wonder what you and others here think about them:

  1. Lowering to various instruction sets.

    I think one of the questions to ask is whether the APIs here can be lowered well to different instruction sets. I believe that may be possible for Arm, but I'm still looking into it. It would be useful for us to have a few examples of how these APIs are envisioned to be used in practice, to make sure we understand the proposal well enough. E.g. maybe a few examples in the same spirit as under “More complex cases” at https://developer.arm.com/support/security-update/compiler-support-for-mitigations? Do you happen to have a suggestion of how this intrinsic would best be lowered on some instruction sets other than Arm? There was quite a bit of debate about the ability to efficiently lower the intrinsic in our proposal on the gcc mailing list, e.g. see https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01136.html.

Focusing on ProtectFromSpeculation API style, it can be lowered *very* efficiently on x86 at least:

  xor  %zero_reg, %zero_reg
  ...
  cmpq %a, 42
  jb below
  ja above
equal:
  cmovne %zero_reg, %arg0
  cmovne %zero_reg, %arg1
  ...
  cmovne %zero_reg, %argN
  ...

below:
  cmovnb %zero_reg, %arg0
  cmovnb %zero_reg, %arg1
  ...
  cmovnb %zero_reg, %argN
  ...

above:
  cmovna %zero_reg, %arg0
  cmovna %zero_reg, %arg1
  ...
  cmovna %zero_reg, %argN
  ...

Especially in the common case of 1 or 2 arguments needing to be zeroed this ends up being quite nice code generation using the existing structure of the conditional branch.

  1. Is the API available in C too?

    IIUC, the SecureBoundOffset intrinsic is usable from both C and C++, but the IsPointerInRange intrinsic can only be used from C++? Do you have ideas around bringing similar functionality to C?

Yeah, this is part of why I suggest the much more generic ProtectFromSpeculation API which I think is easily applicable in C. The C version might use pointers or whatever, but this kind of API doesn't fundamentally require any interesting lang

... errr, lemme try that sentence again.

... any interesting language facilities or extensions.

  1. For the variant with a general predicate (bool ProtectFromSpeculation(bool predicate, ZeroArgs&... args);); do you have ideas about how to make sure that the optimizers in the compiler don’t simplify the predicate too much? For example:

    ` if (a>7) { x = a; if (ProtectFromSpeculation(a>5, x) { ... = v[x]; ... } }

    ` how to prevent this from being optimized to: ` if (a>7) { x = a; if (ProtectFromSpeculation(true, x) { ... = v[x]; ... } } ` which leads to no longer giving protection.

No matter what, this will require deep compiler support to implement. Even without the example you give, these construct fundamentally violate the rules the optimizer uses: they are by definition no-ops for execution of the program!

This means we will have to work to build up specific an dedicated infrastructure in the compiler to model these as having special semantics. That exact infrastructure can provide whatever optimization barriers are necessary to get the desired behavior. For example, the code generation I suggest above for x86 cannot be implemented in LLVM using its IR alone (I've actually tried). We'll have to model this both in the IR and even in the code generator specially in order to produce the kind of specific pattern that is necessary.

But there is also the question of what burden do we want to place on the user of these intrinsics vs. what performance hit we're willing to accept due to optimization barriers. I could imagine two approaches here:

  1. It is the programmers responsibility to correctly protect any predicates that their application is sensitive to. As a consequence, if the a>5 predicate is sensitive for the application, so must the a>7 predicate be, and it is the programmers responsibility to protect both of them. This allows the implementation to have the minimal set of optimization barriers, but may make it difficult for programmers to use correctly.
  2. The predicate provided to these APIs is truly special and is forced to be a *dynamic* predicate. That is, we require the compiler to emit the predicate as if no preconditions existed. There are ways to model this in LLVM and I assume any compiler. As a trivial (but obviously bad) example: all references to variables within the predicate could be lowered by rinsing that SSA value through an opaque construct like inline asm.

    There is clearly a "programmer ease / security" vs. "better optimization" tradeoff between the two. If one isn't *clearly* the correct choice in all cases, we could even expose both behind separate APIs that try to make it clear the extent of protections provided.

    Does that make sense?
  1. Is the API available in C too?

    IIUC, the SecureBoundOffset intrinsic is usable from both C and C++, but the IsPointerInRange intrinsic can only be used from C++? Do you have ideas around bringing similar functionality to C?

Yeah, this is part of why I suggest the much more generic ProtectFromSpeculation API which I think is easily applicable in C. The C version might use pointers or whatever, but this kind of API doesn't fundamentally require any interesting lang

  1. For the variant with a general predicate (bool ProtectFromSpeculation(bool predicate, ZeroArgs&... args);); do you have ideas about how to make sure that the optimizers in the compiler don’t simplify the predicate too much? For example:

    ` if (a>7) { x = a; if (ProtectFromSpeculation(a>5, x) { ... = v[x]; ... } }

    ` how to prevent this from being optimized to: ` if (a>7) { x = a; if (ProtectFromSpeculation(true, x) { ... = v[x]; ... } } ` which leads to no longer giving protection.

No matter what, this will require deep compiler support to implement. Even without the example you give, these construct fundamentally violate the rules the optimizer uses: they are by definition no-ops for execution of the program!

This means we will have to work to build up specific an dedicated infrastructure in the compiler to model these as having special semantics. That exact infrastructure can provide whatever optimization barriers are necessary to get the desired behavior. For example, the code generation I suggest above for x86 cannot be implemented in LLVM using its IR alone (I've actually tried). We'll have to model this both in the IR and even in the code generator specially in order to produce the kind of specific pattern that is necessary.

But there is also the question of what burden do we want to place on the user of these intrinsics vs. what performance hit we're willing to accept due to optimization barriers. I could imagine two approaches here:

  1. It is the programmers responsibility to correctly protect any predicates that their application is sensitive to. As a consequence, if the a>5 predicate is sensitive for the application, so must the a>7 predicate be, and it is the programmers responsibility to protect both of them. This allows the implementation to have the minimal set of optimization barriers, but may make it difficult for programmers to use correctly.
  2. The predicate provided to these APIs is truly special and is forced to be a *dynamic* predicate. That is, we require the compiler to emit the predicate as if no preconditions existed. There are ways to model this in LLVM and I assume any compiler. As a trivial (but obviously bad) example: all references to variables within the predicate could be lowered by rinsing that SSA value through an opaque construct like inline asm.

    There is clearly a "programmer ease / security" vs. "better optimization" tradeoff between the two. If one isn't *clearly* the correct choice in all cases, we could even expose both behind separate APIs that try to make it clear the extent of protections provided.

    Does that make sense?

Thanks for sharing your thoughts, Chandler.
Yes, this does make a lot of sense to me. I was thinking roughly along the same lines - apart from maybe fixating more on a single solution. Your above thoughts for potentially 2 related solutions - or at least experimenting with the 2 related solutions - make a lot of sense to me.
I would expect that only providing option (1) may not be ideal - as it may be hard in some cases to protect all flows, e.g. when also all possible cross-translation unit, inter-procedural flows need to be considered.
Instinctively, I would expect that (2) might be more appealing to many programmers - if the performance overhead of it wouldn't be too high. I also expect that at least a pretty dumb way to force the predicate to be dynamic to not be too hard to implement in a front-end. Although I say that with hardly ever having written any front-end code in practice - it just seems at least conceptually not overly hard. I hope to find some time in the near future to experiment with this. In the mean time, if you, or anyone else, has any further insights to share: I'm all ears!

Thanks!

There is clearly a "programmer ease / security" vs. "better optimization" tradeoff between the two. If one isn't *clearly* the correct choice in all cases, we could even expose both behind separate APIs that try to make it clear the extent of protections provided.

In my experience, relying on programmers to get it right will inevitably fail. When there's a correctness issue, usually mistakes of that kind can be caught; however, security is not generally part of the "correctness" mindset of programmers, even people who should know better. I once had somebody tell me, with a straight face, that an obviously insecure system call was okay because it was an unpublished API and therefore could not be abused.

Security-related intrinsics, more so than most APIs, should be easy to use correctly and hard to use incorrectly.