Introduce llvm.nospeculateload intrinsic
Needs ReviewPublic

Authored by kristof.beyls on Fri, Jan 5, 3:07 AM.



Recently, Google Project Zero disclosed several classes of attack
against speculative execution. One of these, known as variant-1
(CVE-2017-5753), allows explicit bounds checks to be bypassed under
speculation, providing an arbitrary read gadget. Further details can be
found on the GPZ blog [1].

This patch introduces a new LLVM-IR intrinsic, called
llvm.nospeculateload, which enables the implementation of the new
clang-level builtin __builtin_load_no_speculate, see review

This new intrinsic provides a mechanism for limiting speculation by a
CPU after a bounds-checked memory access. We've tried to design this in
such a way that it can be used for any target where this might be
necessary. The patch consists of both target-specific functionality
for Arm and AArch64 code generation, and target-independent
functionality that other targets can reuse.

[1] More information on the topic can be found here:
Arm specific information can be found here:

Diff Detail

kristof.beyls created this revision.Fri, Jan 5, 3:07 AM

Just as an FYI, we have been experimenting with similar APIs ourselves. We developed two candidate alternative APIs that, IMO, seem substantially better than this.

Sadly, we've just not had the time to prepare them for publication (in part due to the unexpected early disclosure). At least on x86, these are likely to be significantly cheaper. I suspect the same will be true on ARM. They will also likely be significantly easier for deployment in our experience auditing a few quite large systems where this is relevant.

I don't really want to hold this up if it needs to land very quickly though. Folks on our end will be working Very Rapidly on at least sharing the design we have in mind if not a complete implementation. Hopefully early next week, but we're still playing a bit of catch-up....

emaste added a subscriber: emaste.Fri, Jan 5, 4:01 AM
reames added a subscriber: reames.Fri, Jan 5, 11:45 AM

A design variation on this which may be worth considering is to phrase this as a speculative use barrier. That is, don't include the load at all, simply provide an intrinsic which guarantees that the result of the load (or any other instruction) will not be consumed by a speculative use with potential side-channel visible side effects.

i.e. restructure the intrinsic with the following form:
declare T @llvm.nospeculate(T %value)
declare T @llvm.nospeculate(T %value, i1 %spec_condition)

(The later variant is for when the problematic condition to speculate is known, but this has unresolved design challenges around CSE of conditions. TBD)

An example using the former would be:
%val = load i32, i32* %potentially_out_bounds
%val.forced = call T @llvm.nospeculate(T %val)
use %val.forced

I'm still thinking through what this would lower to on x86, but I think we can find cheapish instruction sequences which force the first load to retire before the use or treat this as scheduling constraint.

Thanks for the feedback, Chandler and Philip!

Please let me explain how we've ended up with the API design as proposed.

We started from the observation that:

  1. We need an API/intrinsic that can be implemented well in all major compilers, including gcc and clang/llvm.
  2. We need an API/intrinsic that works well across architectures.

For Arm, the recommended mitigation for protecting against Spectre variant 1 is to generate a LOAD->CSEL->CSDB instruction sequence, where CSDB is a new barrier instruction.
This sequence gives protection on all Arm implementations.
This is explained in far more detail at, in section "Software mitigations", pages 4-8.

The need to generate the full LOAD->CSEL->CSDB sequence explains why the proposed intrinsic contains the semantics of loading a value, providing it is within bounds.
Being able to generate the LOAD->CSEL->CSDB sequence from the intrinsic is essential for AArch64 and ARM targets.

Hopefully that explains the needs for the ptr, lower_bound and upper_bound parameters.

The cmpptr and failval parameters are there to make it easier to use in certain scenarios, for example:

For failval, the idea is that for code like

if (ptr >= base && ptr < limit) // bounds check
  return *ptr;
return FAILVAL;

to be able to be easily rewritten as

return __builtin_load_no_speculate (ptr, base, limit, FAILVAL);

The cmpptr parameter was introduced after hearing a need from the linux kernel. They have some cases where cmpptr may be a pointer to an atomic type and want to do something like

if (cmpptr >= lower && cmpptr < upper)
  val = __atomic_read_and_inc (cmpptr);

By separating out cmpptr from ptr you can now write the following, which removes the need to try and wrestle "no-speculate" semantics into the atomic builtins:

if (cmpptr >= lower && cmpptr < upper)
    T tmp_val = __atomic_read_and_inc (cmpptr);
    val = builtin_load_no_speculate (&tmp_val, lower, upper, 0,

There is a bit more explanation on the rationale for the failval and cmpptr parameters at

Furthermore, there are a few more details on the use of this intrinsic at

I hope the above helps to explain the rationale for the proposed API design?