This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
-
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
-
ISDOpcodes.h
-
SelectionDAGNodes.h
-
IR/
-
Intrinsics.td
-
Target/
-
TargetSelectionDAG.td
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
LegalizeIntegerTypes.cpp
-
LegalizeTypes.h
-
SelectionDAG.cpp
-
SelectionDAGBuilder.cpp
-
SelectionDAGDumper.cpp
-
Target/
-
AArch64/
-
AArch64AsmPrinter.cpp
-
AArch64ISelLowering.cpp
-
AArch64InstrInfo.cpp
-
AArch64InstrInfo.td
-
ARM/
-
ARMAsmPrinter.h
-
ARMAsmPrinter.cpp
-
ARMISelLowering.cpp
-
ARMInstrInfo.td
-
ARMInstrThumb2.td
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
no-speculate.ll
-
ARM/
-
no-speculate.ll

Differential D41761

Introduce llvm.nospeculateload intrinsic
Needs ReviewPublic

Authored by kristof.beyls on Jan 5 2018, 3:07 AM.

Download Raw Diff

Details

Reviewers

olista01
javed.absar

Summary

Recently, Google Project Zero disclosed several classes of attack
against speculative execution. One of these, known as variant-1
(CVE-2017-5753), allows explicit bounds checks to be bypassed under
speculation, providing an arbitrary read gadget. Further details can be
found on the GPZ blog [1].

This patch introduces a new LLVM-IR intrinsic, called
llvm.nospeculateload, which enables the implementation of the new
clang-level builtin __builtin_load_no_speculate, see review
https://reviews.llvm.org/D41760.

This new intrinsic provides a mechanism for limiting speculation by a
CPU after a bounds-checked memory access. We've tried to design this in
such a way that it can be used for any target where this might be
necessary. The patch consists of both target-specific functionality
for Arm and AArch64 code generation, and target-independent
functionality that other targets can reuse.

[1] More information on the topic can be found here:
https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
Arm specific information can be found here:
https://www.arm.com/security-update

Diff Detail

Event Timeline

kristof.beyls created this revision.Jan 5 2018, 3:07 AM

Herald added subscribers: javed.absar, aemerson. · View Herald TranscriptJan 5 2018, 3:07 AM

kristof.beyls added a child revision: D41760: Introduce __builtin_load_no_speculate.Jan 5 2018, 3:08 AM

Just as an FYI, we have been experimenting with similar APIs ourselves. We developed two candidate alternative APIs that, IMO, seem substantially better than this.

Sadly, we've just not had the time to prepare them for publication (in part due to the unexpected early disclosure). At least on x86, these are likely to be significantly cheaper. I suspect the same will be true on ARM. They will also likely be significantly easier for deployment in our experience auditing a few quite large systems where this is relevant.

I don't really want to hold this up if it needs to land very quickly though. Folks on our end will be working Very Rapidly on at least sharing the design we have in mind if not a complete implementation. Hopefully early next week, but we're still playing a bit of catch-up....

rogfer01 added a subscriber: rogfer01.Jan 5 2018, 3:50 AM

JDevlieghere added a subscriber: JDevlieghere.Jan 5 2018, 3:55 AM

emaste added a subscriber: emaste.Jan 5 2018, 4:01 AM

kristof.beyls added a subscriber: rearnsha.Jan 5 2018, 5:49 AM

qcolombet added a subscriber: qcolombet.Jan 5 2018, 9:29 AM

A design variation on this which may be worth considering is to phrase this as a speculative use barrier. That is, don't include the load at all, simply provide an intrinsic which guarantees that the result of the load (or any other instruction) will not be consumed by a speculative use with potential side-channel visible side effects.

i.e. restructure the intrinsic with the following form:
declare T @llvm.nospeculate(T %value)
declare T @llvm.nospeculate(T %value, i1 %spec_condition)

(The later variant is for when the problematic condition to speculate is known, but this has unresolved design challenges around CSE of conditions. TBD)

An example using the former would be:
%val = load i32, i32* %potentially_out_bounds
%val.forced = call T @llvm.nospeculate(T %val)
use %val.forced

I'm still thinking through what this would lower to on x86, but I think we can find cheapish instruction sequences which force the first load to retire before the use or treat this as scheduling constraint.

weimingz added a subscriber: weimingz.Jan 9 2018, 10:35 AM

Thanks for the feedback, Chandler and Philip!

Please let me explain how we've ended up with the API design as proposed.

We started from the observation that:

We need an API/intrinsic that can be implemented well in all major compilers, including gcc and clang/llvm.
We need an API/intrinsic that works well across architectures.

For Arm, the recommended mitigation for protecting against Spectre variant 1 is to generate a LOAD->CSEL->CSDB instruction sequence, where CSDB is a new barrier instruction.
This sequence gives protection on all Arm implementations.
This is explained in far more detail at https://developer.arm.com/support/security-update/download-the-whitepaper, in section "Software mitigations", pages 4-8.

The need to generate the full LOAD->CSEL->CSDB sequence explains why the proposed intrinsic contains the semantics of loading a value, providing it is within bounds.
Being able to generate the LOAD->CSEL->CSDB sequence from the intrinsic is essential for AArch64 and ARM targets.

Hopefully that explains the needs for the ptr, lower_bound and upper_bound parameters.

The cmpptr and failval parameters are there to make it easier to use in certain scenarios, for example:

For failval, the idea is that for code like

if (ptr >= base && ptr < limit) // bounds check
  return *ptr;
return FAILVAL;

to be able to be easily rewritten as

return __builtin_load_no_speculate (ptr, base, limit, FAILVAL);

The cmpptr parameter was introduced after hearing a need from the linux kernel. They have some cases where cmpptr may be a pointer to an atomic type and want to do something like

if (cmpptr >= lower && cmpptr < upper)
  val = __atomic_read_and_inc (cmpptr);

By separating out cmpptr from ptr you can now write the following, which removes the need to try and wrestle "no-speculate" semantics into the atomic builtins:

if (cmpptr >= lower && cmpptr < upper)
  {
    T tmp_val = __atomic_read_and_inc (cmpptr);
    val = builtin_load_no_speculate (&tmp_val, lower, upper, 0,
                                     cmpptr);
  }

There is a bit more explanation on the rationale for the failval and cmpptr parameters at https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00594.html.

Furthermore, there are a few more details on the use of this intrinsic at https://developer.arm.com/support/security-update/compiler-support-for-mitigations

I hope the above helps to explain the rationale for the proposed API design?

sdardis added a subscriber: sdardis.Jan 11 2018, 12:59 PM

The API design has been discussed over the past weeks in detail on the gcc mailing list. As a result of that, we propose to adapt the API, to enable efficient code generation also on architectures that need to generate a barrier instruction to achieve the desired semantics.

The main change in the proposed API is to drop the failval parameter and to tweak the semantics to the below.
There is a more detailed rationale for these changes at https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01546.html

I haven't updated the code to implement the new specification yet, but thought I'd share the new specification as soon as possible, while I find the time to adapt the implementation:

This is an overloaded intrinsic. You can use llvm.speculationsafeload on any integer type that can legally be loaded on the target, and any pointer type. However, not all targets support this intrinsic at the moment.

declare T @llvm.speculationsafeload.T(T* %ptr, i8* %lower_bound, i8* %upper_bound, i8* %cmpptr)

Arguments:
""""""""""

The first argument is a pointer, the second is a pointer used as a lower bound, the third is a pointer used as an upper bound. The fourth argument is a pointer.

Semantics:
""""""""""

When the intrinsic is not being executed speculatively:
- if %lower_bound <= %cmpptr < %upper_bound, the value at address %ptr is returned.
- if %cmpptr is not within these bounds, the behaviour is undefined.
When the intrinsic is being executed speculatively, either:
- Execution of instructions following the intrinsic that have a dependency on the result of the intrinsic will be blocked, until the intrinsic is no longer executing speculatively. At this point, the semantics under point 1 above apply.
- Speculation may continue using the value at address %ptr as the return value of the intrinsic, if %lower_bound <= %cmpptr < %upper_bound, or an unspecified constant value if %cmpptr is outside these bounds.

It would be awesome to have static analysis rules to help identify *where* to put these intrinsics. Is somebody working on that? Or did I miss it?

"When the intrinsic is being executed speculatively"

What does this mean?

LLVM IR defines the semantics of a program only in terms of visible side-effects. It does not define any semantics for code which does not execute, and it does not provide any guarantees in terms of what data can be leaked via side-channels. If you're going to attach meaningful semantics to speculationsafeload, you also need to generally define what code the compiler is allowed to emit in blocks which could be speculatively executed. As an extreme example, there needs to be some rule which prevents the compiler from inserting an attackable load immediately before a call to speculationsafeload.

In D41761#979115, @efriedma wrote:

"When the intrinsic is being executed speculatively"

What does this mean?

LLVM IR defines the semantics of a program only in terms of visible side-effects. It does not define any semantics for code which does not execute, and it does not provide any guarantees in terms of what data can be leaked via side-channels. If you're going to attach meaningful semantics to speculationsafeload, you also need to generally define what code the compiler is allowed to emit in blocks which could be speculatively executed. As an extreme example, there needs to be some rule which prevents the compiler from inserting an attackable load immediately before a call to speculationsafeload.

Hi Eli,

Thanks for your feedback and insightful comments.

Regarding "visible side-effects", I think one could argue that the speculative execution of a load can lead to a visible side-effect, as demonstrated by Spectre, even though LLVM IR doesn't define the concept of speculative execution.

Whether code that executes on a CPU due to mis-speculation is "code that executes" or "code that does not execute" depends on the point of view, IMHO.
From the actual micro-architectural execution on the CPU point-of-view, it does execute, even if it is speculatively.
From the model of the execution that LLVM (and probably most other compilers) use, it assumes that execution doesn't happen.
I wouldn't say that code that is executed due to mis-speculation "doesn't happen", but rather that LLVM-IR's model isn't (currently?) capable of representing that.

Should LLVM-IR be extended to be able to represent (some forms of) mis-speculation at some point? I honestly don't know if there would be much value in that.

All of the above being said, with this intrinsic, we aim to make it possible for software developers to protect specific pieces of source code against Spectre variant 1 attacks, and hence would like to get it into both clang and gcc soon.

I do think your point around "speculative execution" not being modelled in LLVM-IR is valid. Maybe the semantics of this intrinsic could be specified avoiding the use of the concept "speculative execution" as much as possible.
It might not be possible to avoid it completely as protection against side channel leaks happening during mis-speculation is the raison d'etre for this intrinsic.
Anyway if that would at least address one of your concerns, I'll look into that.

I believe another concern you raise is around what code transformations compilers may do that could render the protection of such an intrinsic less effective?
Did you have any specific example in mind?
In the implementation, we give the compiler very little knowledge of exactly what is loaded through the intrinsic, to reduce the probability that it would itself introduce non-protected loads to the same location or derived from it.
I'm wondering if you had been thinking of a potential example where this wouldn't be enough?

Thanks!

Kristof

In D41761#978671, @probinson wrote:

It would be awesome to have static analysis rules to help identify *where* to put these intrinsics. Is somebody working on that? Or did I miss it?

Indeed it would be awesome to have static analysis to help identify where these intrinsics should be put. The (non-public) experiments I've seen seem to indicate that it's hard to get a low false positive rate, probably at least in part due to static analyzers not getting information on which variables are influenceable by external input, and which ones not.

I don't think it's likely the compiler would intentionally introduce a load using the same pointer as the operand to a speculationsafeload; given most transforms don't understand what speculationsafeload does, the compiler has no reason to introduce code like that (even if it isn't technically prohibited by LangRef).

More practically, I'm worried about the possibility that code which doesn't appear to be vulnerable at the source-code level will get lowered to assembly which is vulnerable. For example, the compiler could produce code where the CPU speculates a load from an uninitialized pointer value. Without an IR/MIR model for speculation, we have no way to prevent this from happening.

nikhgupt added a subscriber: nikhgupt.Jan 19 2018, 3:10 PM

Adding a few folks on my team who will hopefully take over helping here (my time has vanished). Note that they aren't on llvm-commits, so please reply-to-all etc.

Also, for those joining, a link to the mailing list archive:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180115/516684.html

(I hope you can bear with me while I come up to speed; I understand there's some time pressure here, so I'm erring on the side of speaking sooner)

The proposed API for nospeculateload seems like it could be problematic for programmers, because it gives them no way to tell whether the bounds check passed, unless they are able to identify a failval sentinel that can never be the result of a successful load (and yet is safe to use as the speculative result of the load). Thus, it seems likely that in a lot of cases, the bounds check will be duplicated: once inside nospeculateload, and once in user code. That will make code using this API "feel" inefficient, which will tend to discourage programmers from using it. Furthermore, if it actually is inefficient, that will create pressure for optimization passes to "peek inside" nospeculateload in order to eliminate the duplication, and that seems like a can of worms we really don't want to open.

Another way of putting it is that we probably want this API to be as primitive as possible, because the more logic we put inside the intrinsic, the greater the risk that some parts of it will be unsuited for some users. Consequently, we've been experimenting with APIs that are concerned solely with the bounds check, rather than with the subsequent load:

int64_t SecureBoundedOffset(int64_t offset, int64_t bound);

At the abstract machine level, this function just returns offset, and has the precondition that 0 <= offset < bound, but it also ensures that for speculative executions, offset will always appear to be within those bounds. There are also variants for uint64_t, and variants that take the base-2 log of the bound, for greater efficiency when the bound is a power of two.

template <typename T, typename... ZeroArgs>
bool IsPointerInRange(T*& pointer, T* begin, T* end, ZeroArgs... args);

This function returns whether pointer is between begin and end, and also guarantees that if the function returns false, then any speculative execution which assumes it to be true will treat pointer and args... as zero (all ZeroArgs must be integers or pointers). Notice that this API allows the optimizer to hoist loads past the branch, so long as the loads don't depend on pointer or args...; I'm not sure if that's true of nospeculateload or SecureBoundedOffset.

Notice that by not handling the load itself, these APIs avoid the ptr/cmpptr awkwardness, as well as the need for the user to designate a sentinel value. One tradeoff is that whereas nospeculateload can be thought of as a conditional load, plus some "security magic", these APIs are harder to understand without explicitly thinking about speculative execution. However, I'm not sure that's much of a problem in practice-- if you don't want to think about speculative execution, you probably shouldn't be using this API in the first place.

Is there any way we could implement an interface like those efficiently on ARM?

In D41761#980755, @efriedma wrote:

I don't think it's likely the compiler would intentionally introduce a load using the same pointer as the operand to a speculationsafeload; given most transforms don't understand what speculationsafeload does, the compiler has no reason to introduce code like that (even if it isn't technically prohibited by LangRef).

Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what speculationsafeload does?

More practically, I'm worried about the possibility that code which doesn't appear to be vulnerable at the source-code level will get lowered to assembly which is vulnerable. For example, the compiler could produce code where the CPU speculates a load from an uninitialized pointer value. Without an IR/MIR model for speculation, we have no way to prevent this from happening.

When you say "code which doesn't appear to be vulnerable at the source-code level", do you mean "code that is explicitly protected by speculationsafeload", or "code that doesn't appear to need speculationsafeload in the first place"? If the former, could you give a more concrete example?

It seems to me that we ought to be able to specify this intrinsic without having an explicit model of CPU speculation in the IR, because at the IR level we can just constrain the types of transformations that can be performed on this intrinsic. That way we only need to talk about speculation when we're specifying how this intrinsic is used to generate code for a specific platform that happens to feature CPU-level branch speculation. Very tentatively, I think the specific constraints on transformations that are needed here are that the intrinsic has unmodeled side-effects (so it can't be eliminated or executed speculatively), and that it should be treated as writing to all memory (or only to pointer and args.. in the case of an API like IsPointerInRange).

In D41761#989477, @gromer wrote:
template <typename T, typename... ZeroArgs>
bool IsPointerInRange(T*& pointer, T* begin, T* end, ZeroArgs... args);
This function returns whether pointer is between begin and end, and also guarantees that if the function returns false, then any speculative execution which assumes it to be true will treat pointer and args... as zero (all ZeroArgs must be integers or pointers). Notice that this API allows the optimizer to hoist loads past the branch, so long as the loads don't depend on pointer or args...; I'm not sure if that's true of nospeculateload or SecureBoundedOffset.

Chandler points out offline that this can work for any predicate, not just bound checking, so perhaps the API could instead be something like:

template <typename... ZeroArgs>
bool ProtectFromSpeculation(bool predicate, ZeroArgs&... args);

with the semantics that if predicate is false, speculative execution that treats it as true will also treat args... as zero. This has the arguable problem that, like SecureBoundedOffset, it's a no-op as far as the C++ abstract machine is concerned, so it can't really be explained without talking about speculation, but it's quite simple and general, and I'm not at all convinced that the connection of this API to speculative execution is something we should hide.

Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what speculationsafeload does?

Prohibit what, exactly? According to current LangRef rules, it's legal to introduce a dead load to an arbitrary pointer (even if the compiler can't prove it's dereferencable).

do you mean "code that is explicitly protected by speculationsafeload", or "code that doesn't appear to need speculationsafeload in the first place"?

Both, I guess?

I guess I'll describe the uninitialized pointer problem in a little more detail. The idea is that you have code roughly like this:

// Code in a function; g is a global int.
bool b = f1();
uint8_t* p;
if (b)
    p = &g;
g = 10;
if (b)
    f4(user_array[*p]);

If b is true, we load user_array[10]. If b is false, we speculate user_array[*p], where p is uninitialized (i.e. user-controlled, if you're unlucky). You now have a variant-1 attack.

There's a lot of potential variants of this. For example, instead of user_array, we have another speculatively-uninitialized pointer. Or the load which leaks the data to the user could actually be a speculationsafeload, intended to stop a different variant-1 attack. Or the code could be spread over multiple functions. Or the if statement might not be an if statement (there are lots of ways to get a conditional branch in assembly). Or "p" might be a pointer to a constant pool, so the load-from-undef isn't written in the source code at all.

In D41761#989706, @efriedma wrote:

Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what speculationsafeload does?

Prohibit what, exactly? According to current LangRef rules, it's legal to introduce a dead load to an arbitrary pointer (even if the compiler can't prove it's dereferencable).

I'm basically just asking if we should have some form of assurance stronger than "most transforms don't understand this intrinsic well enough to violate the invariants it relies on". I have no idea what form that assurance would take, since I don't know how LangRef handles such matters.

I guess I'll describe the uninitialized pointer problem in a little more detail. The idea is that you have code roughly like this:
// Code in a function; g is a global int.
bool b = f1();
uint8_t* p;
if (b)
    p = &g;
g = 10;
if (b)
    f4(user_array[*p]);
If b is true, we load user_array[10]. If b is false, we speculate user_array[*p], where p is uninitialized (i.e. user-controlled, if you're unlucky). You now have a variant-1 attack.

Hmm. If I understand correctly, the ProtectFromSpeculation() API I mentioned earlier could guard against this, by including p in the variadic list of clobbers.

There's a lot of potential variants of this. For example, instead of user_array, we have another speculatively-uninitialized pointer. Or the load which leaks the data to the user could actually be a speculationsafeload, intended to stop a different variant-1 attack. Or the code could be spread over multiple functions. Or the if statement might not be an if statement (there are lots of ways to get a conditional branch in assembly). Or "p" might be a pointer to a constant pool, so the load-from-undef isn't written in the source code at all.

How can two variant-1 attacks be "different" enough that a speculationsafeload would protect against one but not the other, when both exploit the same load operation? The only possibility I'm coming up with is that the load has additional safety requirements besides the speculationsafeload bounds check. If so, that might argue for an API that lets you express arbitrary predicates (like ProtectFromSpeculation), rather than just upper/lower bounds checks.

I don't see how the code being spread over multiple functions matters- all that matters are the load, and the branch (or nested branches) that actually guard that load, not any prior branches on the same condition. As for the if-statement not being an if-statement, that's true to the extent that in principle it could be a switch or a loop, but it has to be some kind of conditional control flow that's explicit in user code. This attack is only a platform-level issue because the attacker is exploiting features of the platform to observe the effects of a load that the application-level logic says cannot happen. If the application logic doesn't explicitly prevent any of the loads the attacker is exploiting, that's a plain old application-level vulnerability that LLVM neither can nor should fix.

So it seems to me that all the variations you identify are either application-level vulnerabilities, or can be straightforwardly blocked by something like ProtectFromSpeculation (I'm not quite sure about speculationsafeload).

I have no idea what form that assurance would take, since I don't know how LangRef handles such matters.

Well, I don't really know either; LangRef only describes the abstract virtual machine, mostly. That's part of the problem. :)

How can two variant-1 attacks be "different" enough that a speculationsafeload would protect against one but not the other, when both exploit the same load operation

Sorry, wasn't quite clear. There are two speculated loads for a variant-1 attack: the load that reads the secret, and the load that leaks the secret to the user. The first load has to be speculation-safe to stop the attack; whether the second is speculation-safe is irrelevant, at least in the proposals so far. That isn't really a fundamental problem, just an illustration that reasoning about these attacks is tricky.

I don't see how the code being spread over multiple functions matters- all that matters are the load, and the branch (or nested branches) that actually guard that load

Well, the CPU doesn't care (assuming it can perfectly predict calls), but it's problematic from an auditing perspective because it's harder to spot. Particularly since the vulnerable code might not explicitly reference any user-controlled data at all.

If the application logic doesn't explicitly prevent any of the loads the attacker is exploiting

A C programmer cannot reasonably come up with a complete list of all the potentially exploitable loads without the compiler being aware that the user needs Spectre-resistant code. There are two key problems:

An exploitable load might not exist in the original program. One example is the switch-to-lookup-table transform. Given:

int a(unsigned x) {
  switch (x) {
  case 0: return 2;
  case 1: return 44;
  case 2: return 23;
  default: return 8;
  }
}

We transform to:

int a(int x) {
  if (x > 2)
    return 8;
  const static int table[] = {2, 44, 23};
  return table[x];
}

Now you have a speculated out-of-bounds load from code which didn't contain any loads.

The exploitable code might come out of some non-obvious lowering. Even if a pointer points to something "obviously" safe, it might actually be uninitialized along some impossible path through the function. Probably the easiest example to explain is polly's invariant load hoisting. Basically, if you have a loop like this:

int sum = 0;
for (int i = 0; i < n; ++i) {
  if (b)
    sum += (*p)[i];
}

It gets transformed to something like this:

int *pp;
if (b) pp = *p;
int sum = 0;
for (int i = 0; i < n; ++i) {
  if (b)
    sum += pp[i];
}

So now you have the if() if() pattern which leads to a speculated load from an uninitialized pointer.

In D41761#989799, @efriedma wrote:

How can two variant-1 attacks be "different" enough that a speculationsafeload would protect against one but not the other, when both exploit the same load operation

Sorry, wasn't quite clear. There are two speculated loads for a variant-1 attack: the load that reads the secret, and the load that leaks the secret to the user. The first load has to be speculation-safe to stop the attack; whether the second is speculation-safe is irrelevant, at least in the proposals so far. That isn't really a fundamental problem, just an illustration that reasoning about these attacks is tricky.

Wait, no, this isn't right. It is actually possible to attack a speculationsafeload, at least in theory.

In general, the logic behind the "soft" speculation barriers being proposed is that they block execution if a speculation check fails. But given that the speculation check is itself computed speculatively, you have to ensure that not only is the result correct in the normal sense, but also that it's correct when computed speculatively. And as I showed earlier, IR transforms can turn an arbitrary value into a speculatively-uninitialized value. Therefore, the speculation check can spuriously succeed, and the speculation barrier doesn't reliably prevent speculation.

In D41761#989477, @gromer wrote:
(I hope you can bear with me while I come up to speed; I understand there's some time pressure here, so I'm erring on the side of speaking sooner)

The proposed API for nospeculateload seems like it could be problematic for programmers, because it gives them no way to tell whether the bounds check passed, unless they are able to identify a failval sentinel that can never be the result of a successful load (and yet is safe to use as the speculative result of the load). Thus, it seems likely that in a lot of cases, the bounds check will be duplicated: once inside nospeculateload, and once in user code. That will make code using this API "feel" inefficient, which will tend to discourage programmers from using it. Furthermore, if it actually is inefficient, that will create pressure for optimization passes to "peek inside" nospeculateload in order to eliminate the duplication, and that seems like a can of worms we really don't want to open.

Another way of putting it is that we probably want this API to be as primitive as possible, because the more logic we put inside the intrinsic, the greater the risk that some parts of it will be unsuited for some users. Consequently, we've been experimenting with APIs that are concerned solely with the bounds check, rather than with the subsequent load:
int64_t SecureBoundedOffset(int64_t offset, int64_t bound);
At the abstract machine level, this function just returns offset, and has the precondition that 0 <= offset < bound, but it also ensures that for speculative executions, offset will always appear to be within those bounds. There are also variants for uint64_t, and variants that take the base-2 log of the bound, for greater efficiency when the bound is a power of two.
template <typename T, typename... ZeroArgs>
bool IsPointerInRange(T*& pointer, T* begin, T* end, ZeroArgs... args);
This function returns whether pointer is between begin and end, and also guarantees that if the function returns false, then any speculative execution which assumes it to be true will treat pointer and args... as zero (all ZeroArgs must be integers or pointers). Notice that this API allows the optimizer to hoist loads past the branch, so long as the loads don't depend on pointer or args...; I'm not sure if that's true of nospeculateload or SecureBoundedOffset.

Notice that by not handling the load itself, these APIs avoid the ptr/cmpptr awkwardness, as well as the need for the user to designate a sentinel value. One tradeoff is that whereas nospeculateload can be thought of as a conditional load, plus some "security magic", these APIs are harder to understand without explicitly thinking about speculative execution. However, I'm not sure that's much of a problem in practice-- if you don't want to think about speculative execution, you probably shouldn't be using this API in the first place.

Is there any way we could implement an interface like those efficiently on ARM?

In D41761#980755, @efriedma wrote:

I don't think it's likely the compiler would intentionally introduce a load using the same pointer as the operand to a speculationsafeload; given most transforms don't understand what speculationsafeload does, the compiler has no reason to introduce code like that (even if it isn't technically prohibited by LangRef).

Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what speculationsafeload does?

More practically, I'm worried about the possibility that code which doesn't appear to be vulnerable at the source-code level will get lowered to assembly which is vulnerable. For example, the compiler could produce code where the CPU speculates a load from an uninitialized pointer value. Without an IR/MIR model for speculation, we have no way to prevent this from happening.

When you say "code which doesn't appear to be vulnerable at the source-code level", do you mean "code that is explicitly protected by speculationsafeload", or "code that doesn't appear to need speculationsafeload in the first place"? If the former, could you give a more concrete example?

It seems to me that we ought to be able to specify this intrinsic without having an explicit model of CPU speculation in the IR, because at the IR level we can just constrain the types of transformations that can be performed on this intrinsic. That way we only need to talk about speculation when we're specifying how this intrinsic is used to generate code for a specific platform that happens to feature CPU-level branch speculation. Very tentatively, I think the specific constraints on transformations that are needed here are that the intrinsic has unmodeled side-effects (so it can't be eliminated or executed speculatively), and that it should be treated as writing to all memory (or only to pointer and args.. in the case of an API like IsPointerInRange).

Thanks very much for sharing this, Geoff!
I have a few immediate questions/thoughts, and wonder what you and others here think about them:

Lowering to various instruction sets.

I think one of the questions to ask is whether the APIs here can be lowered well to different instruction sets.
I believe that may be possible for Arm, but I'm still looking into it. It would be useful for us to have a few examples of how these APIs are envisioned to be used in practice, to make sure we understand the proposal well enough. E.g. maybe a few examples in the same spirit as under “More complex cases” at https://developer.arm.com/support/security-update/compiler-support-for-mitigations? Do you happen to have a suggestion of how this intrinsic would best be lowered on some instruction sets other than Arm? There was quite a bit of debate about the ability to efficiently lower the intrinsic in our proposal on the gcc mailing list, e.g. see https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01136.html.

Is the API available in C too?

IIUC, the SecureBoundOffset intrinsic is usable from both C and C++, but the IsPointerInRange intrinsic can only be used from C++? Do you have ideas around bringing similar functionality to C?

For the variant with a general predicate (bool ProtectFromSpeculation(bool predicate, ZeroArgs&... args);); do you have ideas about how to make sure that the optimizers in the compiler don’t simplify the predicate too much?

For example:

if (a>7) {
  x = a;
  if (ProtectFromSpeculation(a>5, x) {
     ... = v[x];
     ...
  }
}

how to prevent this from being optimized to:

if (a>7) {
  x = a;
  if (ProtectFromSpeculation(true, x) {
    ... = v[x];
    ...
  }
}

which leads to no longer giving protection.

In D41761#989799, @efriedma wrote:
I have no idea what form that assurance would take, since I don't know how LangRef handles such matters.

Well, I don't really know either; LangRef only describes the abstract virtual machine, mostly. That's part of the problem. :)

How can two variant-1 attacks be "different" enough that a speculationsafeload would protect against one but not the other, when both exploit the same load operation

Sorry, wasn't quite clear. There are two speculated loads for a variant-1 attack: the load that reads the secret, and the load that leaks the secret to the user. The first load has to be speculation-safe to stop the attack; whether the second is speculation-safe is irrelevant, at least in the proposals so far. That isn't really a fundamental problem, just an illustration that reasoning about these attacks is tricky.

I don't see how the code being spread over multiple functions matters- all that matters are the load, and the branch (or nested branches) that actually guard that load

Well, the CPU doesn't care (assuming it can perfectly predict calls), but it's problematic from an auditing perspective because it's harder to spot. Particularly since the vulnerable code might not explicitly reference any user-controlled data at all.

If the application logic doesn't explicitly prevent any of the loads the attacker is exploiting

A C programmer cannot reasonably come up with a complete list of all the potentially exploitable loads without the compiler being aware that the user needs Spectre-resistant code. There are two key problems:

An exploitable load might not exist in the original program. One example is the switch-to-lookup-table transform. Given:
int a(unsigned x) {
  switch (x) {
  case 0: return 2;
  case 1: return 44;
  case 2: return 23;
  default: return 8;
  }
}
We transform to:
int a(int x) {
  if (x > 2)
    return 8;
  const static int table[] = {2, 44, 23};
  return table[x];
}
Now you have a speculated out-of-bounds load from code which didn't contain any loads.

The exploitable code might come out of some non-obvious lowering. Even if a pointer points to something "obviously" safe, it might actually be uninitialized along some impossible path through the function. Probably the easiest example to explain is polly's invariant load hoisting. Basically, if you have a loop like this:
int sum = 0;
for (int i = 0; i < n; ++i) {
  if (b)
    sum += (*p)[i];
}
It gets transformed to something like this:
int *pp;
if (b) pp = *p;
int sum = 0;
for (int i = 0; i < n; ++i) {
  if (b)
    sum += pp[i];
}
So now you have the if() if() pattern which leads to a speculated load from an uninitialized pointer.

Thanks for the clear examples, Eli.
I wonder if anyone on this thread already has thought about whether it would be practically possible to make those transformations not introduce such a pattern, e.g. under a specific code generation option? Or if a transformation introduces such a pattern, whether it would be feasible for it to make use of whatever the intrinsic is we end up with?

In D41761#990166, @efriedma wrote:

In D41761#989799, @efriedma wrote:

How can two variant-1 attacks be "different" enough that a speculationsafeload would protect against one but not the other, when both exploit the same load operation

Sorry, wasn't quite clear. There are two speculated loads for a variant-1 attack: the load that reads the secret, and the load that leaks the secret to the user. The first load has to be speculation-safe to stop the attack; whether the second is speculation-safe is irrelevant, at least in the proposals so far. That isn't really a fundamental problem, just an illustration that reasoning about these attacks is tricky.

Wait, no, this isn't right. It is actually possible to attack a speculationsafeload, at least in theory.

In general, the logic behind the "soft" speculation barriers being proposed is that they block execution if a speculation check fails. But given that the speculation check is itself computed speculatively, you have to ensure that not only is the result correct in the normal sense, but also that it's correct when computed speculatively. And as I showed earlier, IR transforms can turn an arbitrary value into a speculatively-uninitialized value. Therefore, the speculation check can spuriously succeed, and the speculation barrier doesn't reliably prevent speculation.

I've found the last sentence a bit cryptic. Even though I think I understand what you're pointing out here, I would like to clarify my understanding is correct with the following example (using the SecureBoundedOffset API @gromer proposed):

if (Wednesday) {
   limit = 2;
} else {
   limit = 3;
}

if (a < 0 || a >= limit)
   return;

a = SecureBoundedOffset(a, limit);
v = b[a];

With limit itself potentially having a value based on misspeculation (e.g. what happens when Wednesday is true, and a==2, and both if statements get mispredicted)?
I hope this is an example of what you were referring to @eli.friedman? Or did I misunderstand?
If so, I think there are ways to use the intrinsic multiple times to protect against this - but I'm not sure that's how you envision the use of the intrinsic, @gromer?

I hope this is an example of what you were referring to @eli.friedman? Or did I misunderstand?

That's the basic idea, yes. (Of course, like I mentioned earlier, there are a variety of ways to end up with assembly like even if there isn't an explicit if statment in the source code.)

In D41761#991887, @kristof.beyls wrote:
In D41761#989477, @gromer wrote:
(I hope you can bear with me while I come up to speed; I understand there's some time pressure here, so I'm erring on the side of speaking sooner)

The proposed API for nospeculateload seems like it could be problematic for programmers, because it gives them no way to tell whether the bounds check passed, unless they are able to identify a failval sentinel that can never be the result of a successful load (and yet is safe to use as the speculative result of the load). Thus, it seems likely that in a lot of cases, the bounds check will be duplicated: once inside nospeculateload, and once in user code. That will make code using this API "feel" inefficient, which will tend to discourage programmers from using it. Furthermore, if it actually is inefficient, that will create pressure for optimization passes to "peek inside" nospeculateload in order to eliminate the duplication, and that seems like a can of worms we really don't want to open.

Another way of putting it is that we probably want this API to be as primitive as possible, because the more logic we put inside the intrinsic, the greater the risk that some parts of it will be unsuited for some users. Consequently, we've been experimenting with APIs that are concerned solely with the bounds check, rather than with the subsequent load:
int64_t SecureBoundedOffset(int64_t offset, int64_t bound);
At the abstract machine level, this function just returns offset, and has the precondition that 0 <= offset < bound, but it also ensures that for speculative executions, offset will always appear to be within those bounds. There are also variants for uint64_t, and variants that take the base-2 log of the bound, for greater efficiency when the bound is a power of two.
template <typename T, typename... ZeroArgs>
bool IsPointerInRange(T*& pointer, T* begin, T* end, ZeroArgs... args);
This function returns whether pointer is between begin and end, and also guarantees that if the function returns false, then any speculative execution which assumes it to be true will treat pointer and args... as zero (all ZeroArgs must be integers or pointers). Notice that this API allows the optimizer to hoist loads past the branch, so long as the loads don't depend on pointer or args...; I'm not sure if that's true of nospeculateload or SecureBoundedOffset.

Notice that by not handling the load itself, these APIs avoid the ptr/cmpptr awkwardness, as well as the need for the user to designate a sentinel value. One tradeoff is that whereas nospeculateload can be thought of as a conditional load, plus some "security magic", these APIs are harder to understand without explicitly thinking about speculative execution. However, I'm not sure that's much of a problem in practice-- if you don't want to think about speculative execution, you probably shouldn't be using this API in the first place.

Is there any way we could implement an interface like those efficiently on ARM?

In D41761#980755, @efriedma wrote:

I don't think it's likely the compiler would intentionally introduce a load using the same pointer as the operand to a speculationsafeload; given most transforms don't understand what speculationsafeload does, the compiler has no reason to introduce code like that (even if it isn't technically prohibited by LangRef).

Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what speculationsafeload does?

More practically, I'm worried about the possibility that code which doesn't appear to be vulnerable at the source-code level will get lowered to assembly which is vulnerable. For example, the compiler could produce code where the CPU speculates a load from an uninitialized pointer value. Without an IR/MIR model for speculation, we have no way to prevent this from happening.

When you say "code which doesn't appear to be vulnerable at the source-code level", do you mean "code that is explicitly protected by speculationsafeload", or "code that doesn't appear to need speculationsafeload in the first place"? If the former, could you give a more concrete example?

It seems to me that we ought to be able to specify this intrinsic without having an explicit model of CPU speculation in the IR, because at the IR level we can just constrain the types of transformations that can be performed on this intrinsic. That way we only need to talk about speculation when we're specifying how this intrinsic is used to generate code for a specific platform that happens to feature CPU-level branch speculation. Very tentatively, I think the specific constraints on transformations that are needed here are that the intrinsic has unmodeled side-effects (so it can't be eliminated or executed speculatively), and that it should be treated as writing to all memory (or only to pointer and args.. in the case of an API like IsPointerInRange).
Thanks very much for sharing this, Geoff!
I have a few immediate questions/thoughts, and wonder what you and others here think about them:

Lowering to various instruction sets.

I think one of the questions to ask is whether the APIs here can be lowered well to different instruction sets.
I believe that may be possible for Arm, but I'm still looking into it. It would be useful for us to have a few examples of how these APIs are envisioned to be used in practice, to make sure we understand the proposal well enough. E.g. maybe a few examples in the same spirit as under “More complex cases” at https://developer.arm.com/support/security-update/compiler-support-for-mitigations? Do you happen to have a suggestion of how this intrinsic would best be lowered on some instruction sets other than Arm? There was quite a bit of debate about the ability to efficiently lower the intrinsic in our proposal on the gcc mailing list, e.g. see https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01136.html.

Focusing on ProtectFromSpeculation API style, it can be lowered *very* efficiently on x86 at least:

  xor  %zero_reg, %zero_reg
  ...
  cmpq %a, 42
  jb below
  ja above
equal:
  cmovne %zero_reg, %arg0
  cmovne %zero_reg, %arg1
  ...
  cmovne %zero_reg, %argN
  ...

below:
  cmovnb %zero_reg, %arg0
  cmovnb %zero_reg, %arg1
  ...
  cmovnb %zero_reg, %argN
  ...

above:
  cmovna %zero_reg, %arg0
  cmovna %zero_reg, %arg1
  ...
  cmovna %zero_reg, %argN
  ...

Especially in the common case of 1 or 2 arguments needing to be zeroed this ends up being quite nice code generation using the existing structure of the conditional branch.

Is the API available in C too?

IIUC, the SecureBoundOffset intrinsic is usable from both C and C++, but the IsPointerInRange intrinsic can only be used from C++? Do you have ideas around bringing similar functionality to C?

Yeah, this is part of why I suggest the much more generic ProtectFromSpeculation API which I think is easily applicable in C. The C version might use pointers or whatever, but this kind of API doesn't fundamentally require any interesting lang

For the variant with a general predicate (bool ProtectFromSpeculation(bool predicate, ZeroArgs&... args);); do you have ideas about how to make sure that the optimizers in the compiler don’t simplify the predicate too much?

For example:
if (a>7) {
  x = a;
  if (ProtectFromSpeculation(a>5, x) {
     ... = v[x];
     ...
  }
}
how to prevent this from being optimized to:
if (a>7) {
  x = a;
  if (ProtectFromSpeculation(true, x) {
    ... = v[x];
    ...
  }
}
which leads to no longer giving protection.

No matter what, this will require deep compiler support to implement. Even without the example you give, these construct fundamentally violate the rules the optimizer uses: they are by definition no-ops for execution of the program!

This means we will have to work to build up specific an dedicated infrastructure in the compiler to model these as having special semantics. That exact infrastructure can provide whatever optimization barriers are necessary to get the desired behavior. For example, the code generation I suggest above for x86 cannot be implemented in LLVM using its IR alone (I've actually tried). We'll have to model this both in the IR and even in the code generator specially in order to produce the kind of specific pattern that is necessary.

But there is also the question of what burden do we want to place on the user of these intrinsics vs. what performance hit we're willing to accept due to optimization barriers. I could imagine two approaches here:

It is the programmers responsibility to correctly protect any predicates that their application is sensitive to. As a consequence, if the a>5 predicate is sensitive for the application, so must the a>7 predicate be, and it is the programmers responsibility to protect both of them. This allows the implementation to have the minimal set of optimization barriers, but may make it difficult for programmers to use correctly.

The predicate provided to these APIs is truly special and is forced to be a *dynamic* predicate. That is, we require the compiler to emit the predicate as if no preconditions existed. There are ways to model this in LLVM and I assume any compiler. As a trivial (but obviously bad) example: all references to variables within the predicate could be lowered by rinsing that SSA value through an opaque construct like inline asm.

There is clearly a "programmer ease / security" vs. "better optimization" tradeoff between the two. If one isn't *clearly* the correct choice in all cases, we could even expose both behind separate APIs that try to make it clear the extent of protections provided.

Does that make sense?

In D41761#996858, @chandlerc wrote:
In D41761#991887, @kristof.beyls wrote:
In D41761#989477, @gromer wrote:
(I hope you can bear with me while I come up to speed; I understand there's some time pressure here, so I'm erring on the side of speaking sooner)

The proposed API for nospeculateload seems like it could be problematic for programmers, because it gives them no way to tell whether the bounds check passed, unless they are able to identify a failval sentinel that can never be the result of a successful load (and yet is safe to use as the speculative result of the load). Thus, it seems likely that in a lot of cases, the bounds check will be duplicated: once inside nospeculateload, and once in user code. That will make code using this API "feel" inefficient, which will tend to discourage programmers from using it. Furthermore, if it actually is inefficient, that will create pressure for optimization passes to "peek inside" nospeculateload in order to eliminate the duplication, and that seems like a can of worms we really don't want to open.

Another way of putting it is that we probably want this API to be as primitive as possible, because the more logic we put inside the intrinsic, the greater the risk that some parts of it will be unsuited for some users. Consequently, we've been experimenting with APIs that are concerned solely with the bounds check, rather than with the subsequent load:
int64_t SecureBoundedOffset(int64_t offset, int64_t bound);
At the abstract machine level, this function just returns offset, and has the precondition that 0 <= offset < bound, but it also ensures that for speculative executions, offset will always appear to be within those bounds. There are also variants for uint64_t, and variants that take the base-2 log of the bound, for greater efficiency when the bound is a power of two.
template <typename T, typename... ZeroArgs>
bool IsPointerInRange(T*& pointer, T* begin, T* end, ZeroArgs... args);
This function returns whether pointer is between begin and end, and also guarantees that if the function returns false, then any speculative execution which assumes it to be true will treat pointer and args... as zero (all ZeroArgs must be integers or pointers). Notice that this API allows the optimizer to hoist loads past the branch, so long as the loads don't depend on pointer or args...; I'm not sure if that's true of nospeculateload or SecureBoundedOffset.

Notice that by not handling the load itself, these APIs avoid the ptr/cmpptr awkwardness, as well as the need for the user to designate a sentinel value. One tradeoff is that whereas nospeculateload can be thought of as a conditional load, plus some "security magic", these APIs are harder to understand without explicitly thinking about speculative execution. However, I'm not sure that's much of a problem in practice-- if you don't want to think about speculative execution, you probably shouldn't be using this API in the first place.

Is there any way we could implement an interface like those efficiently on ARM?

In D41761#980755, @efriedma wrote:

I don't think it's likely the compiler would intentionally introduce a load using the same pointer as the operand to a speculationsafeload; given most transforms don't understand what speculationsafeload does, the compiler has no reason to introduce code like that (even if it isn't technically prohibited by LangRef).

Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what speculationsafeload does?

More practically, I'm worried about the possibility that code which doesn't appear to be vulnerable at the source-code level will get lowered to assembly which is vulnerable. For example, the compiler could produce code where the CPU speculates a load from an uninitialized pointer value. Without an IR/MIR model for speculation, we have no way to prevent this from happening.

When you say "code which doesn't appear to be vulnerable at the source-code level", do you mean "code that is explicitly protected by speculationsafeload", or "code that doesn't appear to need speculationsafeload in the first place"? If the former, could you give a more concrete example?

It seems to me that we ought to be able to specify this intrinsic without having an explicit model of CPU speculation in the IR, because at the IR level we can just constrain the types of transformations that can be performed on this intrinsic. That way we only need to talk about speculation when we're specifying how this intrinsic is used to generate code for a specific platform that happens to feature CPU-level branch speculation. Very tentatively, I think the specific constraints on transformations that are needed here are that the intrinsic has unmodeled side-effects (so it can't be eliminated or executed speculatively), and that it should be treated as writing to all memory (or only to pointer and args.. in the case of an API like IsPointerInRange).
Thanks very much for sharing this, Geoff!
I have a few immediate questions/thoughts, and wonder what you and others here think about them:

Lowering to various instruction sets.

I think one of the questions to ask is whether the APIs here can be lowered well to different instruction sets.
I believe that may be possible for Arm, but I'm still looking into it. It would be useful for us to have a few examples of how these APIs are envisioned to be used in practice, to make sure we understand the proposal well enough. E.g. maybe a few examples in the same spirit as under “More complex cases” at https://developer.arm.com/support/security-update/compiler-support-for-mitigations? Do you happen to have a suggestion of how this intrinsic would best be lowered on some instruction sets other than Arm? There was quite a bit of debate about the ability to efficiently lower the intrinsic in our proposal on the gcc mailing list, e.g. see https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01136.html.
Focusing on ProtectFromSpeculation API style, it can be lowered *very* efficiently on x86 at least:
  xor  %zero_reg, %zero_reg
  ...
  cmpq %a, 42
  jb below
  ja above
equal:
  cmovne %zero_reg, %arg0
  cmovne %zero_reg, %arg1
  ...
  cmovne %zero_reg, %argN
  ...

below:
  cmovnb %zero_reg, %arg0
  cmovnb %zero_reg, %arg1
  ...
  cmovnb %zero_reg, %argN
  ...

above:
  cmovna %zero_reg, %arg0
  cmovna %zero_reg, %arg1
  ...
  cmovna %zero_reg, %argN
  ...
Especially in the common case of 1 or 2 arguments needing to be zeroed this ends up being quite nice code generation using the existing structure of the conditional branch.

Is the API available in C too?

IIUC, the SecureBoundOffset intrinsic is usable from both C and C++, but the IsPointerInRange intrinsic can only be used from C++? Do you have ideas around bringing similar functionality to C?

Yeah, this is part of why I suggest the much more generic ProtectFromSpeculation API which I think is easily applicable in C. The C version might use pointers or whatever, but this kind of API doesn't fundamentally require any interesting lang

... errr, lemme try that sentence again.

... any interesting language facilities or extensions.

For the variant with a general predicate (bool ProtectFromSpeculation(bool predicate, ZeroArgs&... args);); do you have ideas about how to make sure that the optimizers in the compiler don’t simplify the predicate too much?

For example:
if (a>7) {
  x = a;
  if (ProtectFromSpeculation(a>5, x) {
     ... = v[x];
     ...
  }
}
how to prevent this from being optimized to:
if (a>7) {
  x = a;
  if (ProtectFromSpeculation(true, x) {
    ... = v[x];
    ...
  }
}
which leads to no longer giving protection.
No matter what, this will require deep compiler support to implement. Even without the example you give, these construct fundamentally violate the rules the optimizer uses: they are by definition no-ops for execution of the program!

This means we will have to work to build up specific an dedicated infrastructure in the compiler to model these as having special semantics. That exact infrastructure can provide whatever optimization barriers are necessary to get the desired behavior. For example, the code generation I suggest above for x86 cannot be implemented in LLVM using its IR alone (I've actually tried). We'll have to model this both in the IR and even in the code generator specially in order to produce the kind of specific pattern that is necessary.

But there is also the question of what burden do we want to place on the user of these intrinsics vs. what performance hit we're willing to accept due to optimization barriers. I could imagine two approaches here:

It is the programmers responsibility to correctly protect any predicates that their application is sensitive to. As a consequence, if the a>5 predicate is sensitive for the application, so must the a>7 predicate be, and it is the programmers responsibility to protect both of them. This allows the implementation to have the minimal set of optimization barriers, but may make it difficult for programmers to use correctly.

The predicate provided to these APIs is truly special and is forced to be a *dynamic* predicate. That is, we require the compiler to emit the predicate as if no preconditions existed. There are ways to model this in LLVM and I assume any compiler. As a trivial (but obviously bad) example: all references to variables within the predicate could be lowered by rinsing that SSA value through an opaque construct like inline asm.

There is clearly a "programmer ease / security" vs. "better optimization" tradeoff between the two. If one isn't *clearly* the correct choice in all cases, we could even expose both behind separate APIs that try to make it clear the extent of protections provided.

Does that make sense?

In D41761#996858, @chandlerc wrote:
Is the API available in C too?

IIUC, the SecureBoundOffset intrinsic is usable from both C and C++, but the IsPointerInRange intrinsic can only be used from C++? Do you have ideas around bringing similar functionality to C?

Yeah, this is part of why I suggest the much more generic ProtectFromSpeculation API which I think is easily applicable in C. The C version might use pointers or whatever, but this kind of API doesn't fundamentally require any interesting lang
For the variant with a general predicate (bool ProtectFromSpeculation(bool predicate, ZeroArgs&... args);); do you have ideas about how to make sure that the optimizers in the compiler don’t simplify the predicate too much?

For example:
if (a>7) {
  x = a;
  if (ProtectFromSpeculation(a>5, x) {
     ... = v[x];
     ...
  }
}
how to prevent this from being optimized to:
if (a>7) {
  x = a;
  if (ProtectFromSpeculation(true, x) {
    ... = v[x];
    ...
  }
}
which leads to no longer giving protection.
No matter what, this will require deep compiler support to implement. Even without the example you give, these construct fundamentally violate the rules the optimizer uses: they are by definition no-ops for execution of the program!

This means we will have to work to build up specific an dedicated infrastructure in the compiler to model these as having special semantics. That exact infrastructure can provide whatever optimization barriers are necessary to get the desired behavior. For example, the code generation I suggest above for x86 cannot be implemented in LLVM using its IR alone (I've actually tried). We'll have to model this both in the IR and even in the code generator specially in order to produce the kind of specific pattern that is necessary.

But there is also the question of what burden do we want to place on the user of these intrinsics vs. what performance hit we're willing to accept due to optimization barriers. I could imagine two approaches here:

It is the programmers responsibility to correctly protect any predicates that their application is sensitive to. As a consequence, if the a>5 predicate is sensitive for the application, so must the a>7 predicate be, and it is the programmers responsibility to protect both of them. This allows the implementation to have the minimal set of optimization barriers, but may make it difficult for programmers to use correctly.

The predicate provided to these APIs is truly special and is forced to be a *dynamic* predicate. That is, we require the compiler to emit the predicate as if no preconditions existed. There are ways to model this in LLVM and I assume any compiler. As a trivial (but obviously bad) example: all references to variables within the predicate could be lowered by rinsing that SSA value through an opaque construct like inline asm.

There is clearly a "programmer ease / security" vs. "better optimization" tradeoff between the two. If one isn't *clearly* the correct choice in all cases, we could even expose both behind separate APIs that try to make it clear the extent of protections provided.

Does that make sense?

Thanks for sharing your thoughts, Chandler.
Yes, this does make a lot of sense to me. I was thinking roughly along the same lines - apart from maybe fixating more on a single solution. Your above thoughts for potentially 2 related solutions - or at least experimenting with the 2 related solutions - make a lot of sense to me.
I would expect that only providing option (1) may not be ideal - as it may be hard in some cases to protect all flows, e.g. when also all possible cross-translation unit, inter-procedural flows need to be considered.
Instinctively, I would expect that (2) might be more appealing to many programmers - if the performance overhead of it wouldn't be too high. I also expect that at least a pretty dumb way to force the predicate to be dynamic to not be too hard to implement in a front-end. Although I say that with hardly ever having written any front-end code in practice - it just seems at least conceptually not overly hard. I hope to find some time in the near future to experiment with this. In the mean time, if you, or anyone else, has any further insights to share: I'm all ears!

Thanks!

In D41761#996858, @chandlerc wrote:

There is clearly a "programmer ease / security" vs. "better optimization" tradeoff between the two. If one isn't *clearly* the correct choice in all cases, we could even expose both behind separate APIs that try to make it clear the extent of protections provided.

In my experience, relying on programmers to get it right will inevitably fail. When there's a correctness issue, usually mistakes of that kind can be caught; however, security is not generally part of the "correctness" mindset of programmers, even people who should know better. I once had somebody tell me, with a straight face, that an obviously insecure system call was okay because it was an unpublished API and therefore could not be abused.

Security-related intrinsics, more so than most APIs, should be easy to use correctly and hard to use incorrectly.

atanasyan added a subscriber: atanasyan.Jun 19 2018, 6:15 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptJun 19 2018, 6:15 AM

kristof.beyls added a child revision: D49071: Enable lowering of llvm.speculation_safe_value to DSB/ISB pair..Jul 9 2018, 4:43 AM

kristof.beyls removed a child revision: D49071: Enable lowering of llvm.speculation_safe_value to DSB/ISB pair..

mloud added a subscriber: mloud.Sep 25 2021, 11:39 AM

Herald added a subscriber: jdoerfert. · View Herald TranscriptSep 25 2021, 11:39 AM

Revision Contents

Path

Size

docs/

LangRef.rst

49 lines

include/

llvm/

CodeGen/

ISDOpcodes.h

4 lines

SelectionDAGNodes.h

6 lines

IR/

Intrinsics.td

17 lines

Target/

TargetSelectionDAG.td

15 lines

lib/

CodeGen/

SelectionDAG/

LegalizeIntegerTypes.cpp

53 lines

LegalizeTypes.h

5 lines

SelectionDAG.cpp

3 lines

SelectionDAGBuilder.cpp

57 lines

SelectionDAGDumper.cpp

3 lines

Target/

AArch64/

AArch64AsmPrinter.cpp

113 lines

AArch64ISelLowering.cpp

72 lines

AArch64InstrInfo.cpp

12 lines

AArch64InstrInfo.td

94 lines

ARM/

4 lines

308 lines

119 lines

89 lines

91 lines

test/

CodeGen/

AArch64/

no-speculate.ll

255 lines

ARM/

no-speculate.ll

243 lines

Diff 128727

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 9,991 Lines • ▼ Show 20 Lines
	None.			None.

	Semantics:			Semantics:
	""""""""""			""""""""""

	This intrinsic actually does nothing, but optimizers must assume that it			This intrinsic actually does nothing, but optimizers must assume that it
	has externally observable side effects.			has externally observable side effects.


				'``llvm.nospeculateload``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				This is an overloaded intrinsic. You can use llvm.nospeculateload on any
				integer type that can legally be loaded on the target, and any pointer type.
				However, not all targets support this intrinsic at the moment.

				::

				declare T @llvm.nospeculateload.T(T* %ptr, i8* %lower_bound, i8* %upper_bound,
				T failval, i8* %cmpptr)
				declare T @llvm.nospeculateload_nolower.T(T* %ptr, i8* %upper_bound,
				T failval, i8* %cmpptr)
				declare T @llvm.nospeculateload_noupper.T(T* %ptr, i8* %lower_bound,
				T failval, i8* %cmpptr)

				Overview:
				"""""""""

				The '``llvm.nospeculateload``' intrinsic.

				Arguments:
				""""""""""

				The first argument is a pointer, the second is a pointer used as a lower bound,
				the third is a pointer to an upper bound.
				The fourth argument is a value of the overloaded type.
				The fifth argument is a pointer.

				In the ``llvm.nospeculateload_nolower`` and ``llvm.nospeculateload_noupper``
				variants, the lower and upper bound arguments are missing respectively.

				Semantics:
				""""""""""

				If %cmpptr lies within the range (%lower_bound <= %cmpptr < %upper_bound) then
				the value at address %ptr is returned. Otherwise, value %failval is returned.

				Furthermore, the builtin will ensure that if %ptr is dereferenced speculatively
				at execution time (that is, without checking the boundary conditions) the
				result will not be used for further speculation unless the boundary conditions
				are satisfied. Speculation may continue, however, using failval as the
				speculative result.


	Stack Map Intrinsics			Stack Map Intrinsics
	--------------------			--------------------

	LLVM provides experimental intrinsics to support runtime patching			LLVM provides experimental intrinsics to support runtime patching
	mechanisms commonly desired in dynamic language JITs. These intrinsics			mechanisms commonly desired in dynamic language JITs. These intrinsics
	are described in :doc:`StackMaps`.			are described in :doc:`StackMaps`.

	Element Wise Atomic Memory Intrinsics			Element Wise Atomic Memory Intrinsics
	▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 824 Lines • ▼ Show 20 Lines	enum NodeType {
/// These reductions are non-strict, and have a single vector operand.		/// These reductions are non-strict, and have a single vector operand.
VECREDUCE_FADD, VECREDUCE_FMUL,		VECREDUCE_FADD, VECREDUCE_FMUL,
VECREDUCE_ADD, VECREDUCE_MUL,		VECREDUCE_ADD, VECREDUCE_MUL,
VECREDUCE_AND, VECREDUCE_OR, VECREDUCE_XOR,		VECREDUCE_AND, VECREDUCE_OR, VECREDUCE_XOR,
VECREDUCE_SMAX, VECREDUCE_SMIN, VECREDUCE_UMAX, VECREDUCE_UMIN,		VECREDUCE_SMAX, VECREDUCE_SMIN, VECREDUCE_UMAX, VECREDUCE_UMIN,
/// FMIN/FMAX nodes can have flags, for NaN/NoNaN variants.		/// FMIN/FMAX nodes can have flags, for NaN/NoNaN variants.
VECREDUCE_FMAX, VECREDUCE_FMIN,		VECREDUCE_FMAX, VECREDUCE_FMIN,

		NOSPECULATELOAD,
		NOSPECULATELOAD_NOLOWER,
		NOSPECULATELOAD_NOUPPER,

/// BUILTIN_OP_END - This must be the last enum value in this list.		/// BUILTIN_OP_END - This must be the last enum value in this list.
/// The target-specific pre-isel opcode values start here.		/// The target-specific pre-isel opcode values start here.
BUILTIN_OP_END		BUILTIN_OP_END
};		};

/// FIRST_TARGET_MEMORY_OPCODE - Target-specific pre-isel operations		/// FIRST_TARGET_MEMORY_OPCODE - Target-specific pre-isel operations
/// which do not reference a specific memory location should be less than		/// which do not reference a specific memory location should be less than
/// this value. Those that do must not be less than this value, and can		/// this value. Those that do must not be less than this value, and can
▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 1,255 Lines • ▼ Show 20 Lines	public:

// Methods to support isa and dyn_cast		// Methods to support isa and dyn_cast
static bool classof(const SDNode *N) {		static bool classof(const SDNode *N) {
// For some targets, we lower some target intrinsics to a MemIntrinsicNode		// For some targets, we lower some target intrinsics to a MemIntrinsicNode
// with either an intrinsic or a target opcode.		// with either an intrinsic or a target opcode.
return N->getOpcode() == ISD::LOAD \|\|		return N->getOpcode() == ISD::LOAD \|\|
N->getOpcode() == ISD::STORE \|\|		N->getOpcode() == ISD::STORE \|\|
N->getOpcode() == ISD::PREFETCH \|\|		N->getOpcode() == ISD::PREFETCH \|\|
		N->getOpcode() == ISD::NOSPECULATELOAD \|\|
		N->getOpcode() == ISD::NOSPECULATELOAD_NOLOWER \|\|
		N->getOpcode() == ISD::NOSPECULATELOAD_NOUPPER \|\|
N->getOpcode() == ISD::ATOMIC_CMP_SWAP \|\|		N->getOpcode() == ISD::ATOMIC_CMP_SWAP \|\|
N->getOpcode() == ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS \|\|		N->getOpcode() == ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS \|\|
N->getOpcode() == ISD::ATOMIC_SWAP \|\|		N->getOpcode() == ISD::ATOMIC_SWAP \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_ADD \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_ADD \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_SUB \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_SUB \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_AND \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_AND \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_OR \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_OR \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_XOR \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_XOR \|\|
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	public:
}		}

// Methods to support isa and dyn_cast		// Methods to support isa and dyn_cast
static bool classof(const SDNode *N) {		static bool classof(const SDNode *N) {
// We lower some target intrinsics to their target opcode		// We lower some target intrinsics to their target opcode
// early a node with a target opcode can be of this class		// early a node with a target opcode can be of this class
return N->isMemIntrinsic() \|\|		return N->isMemIntrinsic() \|\|
N->getOpcode() == ISD::PREFETCH \|\|		N->getOpcode() == ISD::PREFETCH \|\|
		N->getOpcode() == ISD::NOSPECULATELOAD \|\|
		N->getOpcode() == ISD::NOSPECULATELOAD_NOLOWER \|\|
		N->getOpcode() == ISD::NOSPECULATELOAD_NOUPPER \|\|
N->isTargetMemoryOpcode();		N->isTargetMemoryOpcode();
}		}
};		};

/// This SDNode is used to implement the code generator		/// This SDNode is used to implement the code generator
/// support for the llvm IR shufflevector instruction. It combines elements		/// support for the llvm IR shufflevector instruction. It combines elements
/// from two input vectors into a new input vector, with the selection and		/// from two input vectors into a new input vector, with the selection and
/// ordering of elements determined by an array of integers, referred to as		/// ordering of elements determined by an array of integers, referred to as
▲ Show 20 Lines • Show All 978 Lines • Show Last 20 Lines

include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 957 Lines • ▼ Show 20 Lines
	def int_experimental_vector_reduce_fmin : Intrinsic<[llvm_anyfloat_ty],			def int_experimental_vector_reduce_fmin : Intrinsic<[llvm_anyfloat_ty],
	[llvm_anyvector_ty],			[llvm_anyvector_ty],
	[IntrNoMem]>;			[IntrNoMem]>;

	//===----- Intrinsics that are used to provide predicate information -----===//			//===----- Intrinsics that are used to provide predicate information -----===//

	def int_ssa_copy : Intrinsic<[llvm_any_ty], [LLVMMatchType<0>],			def int_ssa_copy : Intrinsic<[llvm_any_ty], [LLVMMatchType<0>],
	[IntrNoMem, Returned<0>]>;			[IntrNoMem, Returned<0>]>;

				//===------------------ Intrinsics to avoid speculation ------------------===//
				def int_nospeculateload
				: Intrinsic<[llvm_any_ty],
				[LLVMPointerTo<0>, llvm_ptr_ty, llvm_ptr_ty, LLVMMatchType<0>, llvm_ptr_ty],
				[]>;
				def int_nospeculateload_nolower
				: Intrinsic<[llvm_any_ty],
				[LLVMPointerTo<0>, llvm_ptr_ty, LLVMMatchType<0>, llvm_ptr_ty],
				[]>;
				def int_nospeculateload_noupper
				: Intrinsic<[llvm_any_ty],
				[LLVMPointerTo<0>, llvm_ptr_ty, LLVMMatchType<0>, llvm_ptr_ty],
				[]>;



	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Target-specific intrinsics			// Target-specific intrinsics
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	include "llvm/IR/IntrinsicsPowerPC.td"			include "llvm/IR/IntrinsicsPowerPC.td"
	include "llvm/IR/IntrinsicsX86.td"			include "llvm/IR/IntrinsicsX86.td"
	include "llvm/IR/IntrinsicsARM.td"			include "llvm/IR/IntrinsicsARM.td"
	include "llvm/IR/IntrinsicsAArch64.td"			include "llvm/IR/IntrinsicsAArch64.td"
	include "llvm/IR/IntrinsicsXCore.td"			include "llvm/IR/IntrinsicsXCore.td"
	include "llvm/IR/IntrinsicsHexagon.td"			include "llvm/IR/IntrinsicsHexagon.td"
	include "llvm/IR/IntrinsicsNVVM.td"			include "llvm/IR/IntrinsicsNVVM.td"
	include "llvm/IR/IntrinsicsMips.td"			include "llvm/IR/IntrinsicsMips.td"
	include "llvm/IR/IntrinsicsAMDGPU.td"			include "llvm/IR/IntrinsicsAMDGPU.td"
	include "llvm/IR/IntrinsicsBPF.td"			include "llvm/IR/IntrinsicsBPF.td"
	include "llvm/IR/IntrinsicsSystemZ.td"			include "llvm/IR/IntrinsicsSystemZ.td"
	include "llvm/IR/IntrinsicsWebAssembly.td"			include "llvm/IR/IntrinsicsWebAssembly.td"

include/llvm/Target/TargetSelectionDAG.td

	Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines
	def SDTAtomicLoad : SDTypeProfile<1, 1, [			def SDTAtomicLoad : SDTypeProfile<1, 1, [
	SDTCisInt<0>, SDTCisPtrTy<1>			SDTCisInt<0>, SDTCisPtrTy<1>
	]>;			]>;

	def SDTConvertOp : SDTypeProfile<1, 5, [ //cvtss, su, us, uu, ff, fs, fu, sf, su			def SDTConvertOp : SDTypeProfile<1, 5, [ //cvtss, su, us, uu, ff, fs, fu, sf, su
	SDTCisVT<2, OtherVT>, SDTCisVT<3, OtherVT>, SDTCisPtrTy<4>, SDTCisPtrTy<5>			SDTCisVT<2, OtherVT>, SDTCisVT<3, OtherVT>, SDTCisPtrTy<4>, SDTCisPtrTy<5>
	]>;			]>;

				def SDTNoSpeculateLoad: SDTypeProfile<1, 5, [
				SDTCisPtrTy<1>, SDTCisPtrTy<2>, SDTCisPtrTy<3>, SDTCisSameAs<4, 0>, SDTCisPtrTy<5>
				]>;

				def SDTNoSpeculateLoadOneCheck: SDTypeProfile<1, 4, [
				SDTCisPtrTy<1>, SDTCisPtrTy<2>, SDTCisSameAs<3, 0>, SDTCisPtrTy<4>
				]>;

	class SDCallSeqStart<list<SDTypeConstraint> constraints> :			class SDCallSeqStart<list<SDTypeConstraint> constraints> :
	SDTypeProfile<0, 2, constraints>;			SDTypeProfile<0, 2, constraints>;
	class SDCallSeqEnd<list<SDTypeConstraint> constraints> :			class SDCallSeqEnd<list<SDTypeConstraint> constraints> :
	SDTypeProfile<0, 2, constraints>;			SDTypeProfile<0, 2, constraints>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Selection DAG Node definitions.			// Selection DAG Node definitions.
	//			//
	▲ Show 20 Lines • Show All 266 Lines • ▼ Show 20 Lines
	def intrinsic_wo_chain : SDNode<"ISD::INTRINSIC_WO_CHAIN",			def intrinsic_wo_chain : SDNode<"ISD::INTRINSIC_WO_CHAIN",
	SDTypeProfile<1, -1, [SDTCisPtrTy<1>]>, []>;			SDTypeProfile<1, -1, [SDTCisPtrTy<1>]>, []>;

	def SDT_assertext : SDTypeProfile<1, 1,			def SDT_assertext : SDTypeProfile<1, 1,
	[SDTCisInt<0>, SDTCisInt<1>, SDTCisSameAs<1, 0>]>;			[SDTCisInt<0>, SDTCisInt<1>, SDTCisSameAs<1, 0>]>;
	def assertsext : SDNode<"ISD::AssertSext", SDT_assertext>;			def assertsext : SDNode<"ISD::AssertSext", SDT_assertext>;
	def assertzext : SDNode<"ISD::AssertZext", SDT_assertext>;			def assertzext : SDNode<"ISD::AssertZext", SDT_assertext>;

				def nospeculateload : SDNode<"ISD::NOSPECULATELOAD", SDTNoSpeculateLoad,
				[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
				def nospeculateload_nolower : SDNode<"ISD::NOSPECULATELOAD_NOLOWER", SDTNoSpeculateLoadOneCheck,
				[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
				def nospeculateload_noupper : SDNode<"ISD::NOSPECULATELOAD_NOUPPER", SDTNoSpeculateLoadOneCheck,
				[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Selection DAG Condition Codes			// Selection DAG Condition Codes

	class CondCode; // ISD::CondCode enums			class CondCode; // ISD::CondCode enums
	def SETOEQ : CondCode; def SETOGT : CondCode;			def SETOEQ : CondCode; def SETOGT : CondCode;
	def SETOGE : CondCode; def SETOLT : CondCode; def SETOLE : CondCode;			def SETOGE : CondCode; def SETOLT : CondCode; def SETOLE : CondCode;
	def SETONE : CondCode; def SETO : CondCode; def SETUO : CondCode;			def SETONE : CondCode; def SETO : CondCode; def SETUO : CondCode;
	▲ Show 20 Lines • Show All 751 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	#endif
case ISD::ATOMIC_LOAD_UMAX:		case ISD::ATOMIC_LOAD_UMAX:
case ISD::ATOMIC_SWAP:		case ISD::ATOMIC_SWAP:
Res = PromoteIntRes_Atomic1(cast<AtomicSDNode>(N)); break;		Res = PromoteIntRes_Atomic1(cast<AtomicSDNode>(N)); break;

case ISD::ATOMIC_CMP_SWAP:		case ISD::ATOMIC_CMP_SWAP:
case ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS:		case ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS:
Res = PromoteIntRes_AtomicCmpSwap(cast<AtomicSDNode>(N), ResNo);		Res = PromoteIntRes_AtomicCmpSwap(cast<AtomicSDNode>(N), ResNo);
break;		break;
		case ISD::NOSPECULATELOAD:
		Res = PromoteIntRes_NOSPECULATELOAD(cast<MemSDNode>(N));
		break;
		case ISD::NOSPECULATELOAD_NOLOWER:
		case ISD::NOSPECULATELOAD_NOUPPER:
		Res = PromoteIntRes_NOSPECULATELOAD_OneCheck(cast<MemSDNode>(N));
		break;
}		}

// If the result is null then the sub-method took care of registering it.		// If the result is null then the sub-method took care of registering it.
if (Res.getNode())		if (Res.getNode())
SetPromotedInteger(SDValue(N, ResNo), Res);		SetPromotedInteger(SDValue(N, ResNo), Res);
}		}

SDValue DAGTypeLegalizer::PromoteIntRes_MERGE_VALUES(SDNode *N,		SDValue DAGTypeLegalizer::PromoteIntRes_MERGE_VALUES(SDNode *N,
▲ Show 20 Lines • Show All 468 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::PromoteIntRes_SimpleIntBinOp(SDNode *N) {
// these operations don't care. They may have weird bits going out, but		// these operations don't care. They may have weird bits going out, but
// that too is okay if they are integer operations.		// that too is okay if they are integer operations.
SDValue LHS = GetPromotedInteger(N->getOperand(0));		SDValue LHS = GetPromotedInteger(N->getOperand(0));
SDValue RHS = GetPromotedInteger(N->getOperand(1));		SDValue RHS = GetPromotedInteger(N->getOperand(1));
return DAG.getNode(N->getOpcode(), SDLoc(N),		return DAG.getNode(N->getOpcode(), SDLoc(N),
LHS.getValueType(), LHS, RHS);		LHS.getValueType(), LHS, RHS);
}		}

		SDValue DAGTypeLegalizer::PromoteIntRes_NOSPECULATELOAD(MemSDNode *N) {
		SDValue FailVal = GetPromotedInteger(N->getOperand(4));
		SDVTList NodeTys = DAG.getVTList(FailVal.getValueType(), MVT::Other);

		SDValue Ops[] = {
		N->getOperand(0), // Chain
		N->getOperand(1), // Ptr
		N->getOperand(2), // LowerBound
		N->getOperand(3), // UpperBound
		FailVal,
		N->getOperand(5), // CmpPtr
		};

		SDValue Result = DAG.getMemIntrinsicNode(
		ISD::NOSPECULATELOAD, SDLoc(N), NodeTys, Ops,
		SDValue(N, 0).getValueType(), N->getMemOperand());

		// Modified the chain result - switch anything that used the old chain to
		// use the new one.
		ReplaceValueWith(SDValue(N, 1), Result.getValue(1));

		return Result;
		}

		SDValue DAGTypeLegalizer::PromoteIntRes_NOSPECULATELOAD_OneCheck(MemSDNode *N) {
		SDValue FailVal = GetPromotedInteger(N->getOperand(3));
		SDVTList NodeTys = DAG.getVTList(FailVal.getValueType(), MVT::Other);

		SDValue Ops[] = {
		N->getOperand(0), // Chain
		N->getOperand(1), // Ptr
		N->getOperand(2), // Bound
		FailVal,
		N->getOperand(4), // CmpPtr
		};
		SDValue Result = DAG.getMemIntrinsicNode(
		N->getOpcode(), SDLoc(N), NodeTys, Ops,
		SDValue(N, 0).getValueType(), N->getMemOperand());

		// Modified the chain result - switch anything that used the old chain to
		// use the new one.
		ReplaceValueWith(SDValue(N, 1), Result.getValue(1));

		return Result;
		}

SDValue DAGTypeLegalizer::PromoteIntRes_SExtIntBinOp(SDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntRes_SExtIntBinOp(SDNode *N) {
// Sign extend the input.		// Sign extend the input.
SDValue LHS = SExtPromotedInteger(N->getOperand(0));		SDValue LHS = SExtPromotedInteger(N->getOperand(0));
SDValue RHS = SExtPromotedInteger(N->getOperand(1));		SDValue RHS = SExtPromotedInteger(N->getOperand(1));
return DAG.getNode(N->getOpcode(), SDLoc(N),		return DAG.getNode(N->getOpcode(), SDLoc(N),
LHS.getValueType(), LHS, RHS);		LHS.getValueType(), LHS, RHS);
}		}

▲ Show 20 Lines • Show All 2,983 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	private:
SDValue PromoteIntRes_SRA(SDNode *N);		SDValue PromoteIntRes_SRA(SDNode *N);
SDValue PromoteIntRes_SRL(SDNode *N);		SDValue PromoteIntRes_SRL(SDNode *N);
SDValue PromoteIntRes_TRUNCATE(SDNode *N);		SDValue PromoteIntRes_TRUNCATE(SDNode *N);
SDValue PromoteIntRes_UADDSUBO(SDNode *N, unsigned ResNo);		SDValue PromoteIntRes_UADDSUBO(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_ADDSUBCARRY(SDNode *N, unsigned ResNo);		SDValue PromoteIntRes_ADDSUBCARRY(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_UNDEF(SDNode *N);		SDValue PromoteIntRes_UNDEF(SDNode *N);
SDValue PromoteIntRes_VAARG(SDNode *N);		SDValue PromoteIntRes_VAARG(SDNode *N);
SDValue PromoteIntRes_XMULO(SDNode *N, unsigned ResNo);		SDValue PromoteIntRes_XMULO(SDNode *N, unsigned ResNo);
		SDValue PromoteIntRes_NOSPECULATELOAD(MemSDNode *N);
		SDValue PromoteIntRes_NOSPECULATELOAD_OneCheck(MemSDNode *N);

// Integer Operand Promotion.		// Integer Operand Promotion.
bool PromoteIntegerOperand(SDNode *N, unsigned OperandNo);		bool PromoteIntegerOperand(SDNode *N, unsigned OperandNo);
SDValue PromoteIntOp_ANY_EXTEND(SDNode *N);		SDValue PromoteIntOp_ANY_EXTEND(SDNode *N);
SDValue PromoteIntOp_ATOMIC_STORE(AtomicSDNode *N);		SDValue PromoteIntOp_ATOMIC_STORE(AtomicSDNode *N);
SDValue PromoteIntOp_BITCAST(SDNode *N);		SDValue PromoteIntOp_BITCAST(SDNode *N);
SDValue PromoteIntOp_BUILD_PAIR(SDNode *N);		SDValue PromoteIntOp_BUILD_PAIR(SDNode *N);
SDValue PromoteIntOp_BR_CC(SDNode *N, unsigned OpNo);		SDValue PromoteIntOp_BR_CC(SDNode *N, unsigned OpNo);
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	private:
void ExpandIntRes_MINMAX (SDNode *N, SDValue &Lo, SDValue &Hi);		void ExpandIntRes_MINMAX (SDNode *N, SDValue &Lo, SDValue &Hi);

void ExpandIntRes_SADDSUBO (SDNode *N, SDValue &Lo, SDValue &Hi);		void ExpandIntRes_SADDSUBO (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_UADDSUBO (SDNode *N, SDValue &Lo, SDValue &Hi);		void ExpandIntRes_UADDSUBO (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_XMULO (SDNode *N, SDValue &Lo, SDValue &Hi);		void ExpandIntRes_XMULO (SDNode *N, SDValue &Lo, SDValue &Hi);

void ExpandIntRes_ATOMIC_LOAD (SDNode *N, SDValue &Lo, SDValue &Hi);		void ExpandIntRes_ATOMIC_LOAD (SDNode *N, SDValue &Lo, SDValue &Hi);

		void ExpandIntRes_NOSPECULATELOAD (MemSDNode *N, SDValue &Lo, SDValue &Hi);
		void ExpandIntRes_NOSPECULATELOAD_OneCheck(MemSDNode *N, SDValue &Lo, SDValue &Hi);

void ExpandShiftByConstant(SDNode *N, const APInt &Amt,		void ExpandShiftByConstant(SDNode *N, const APInt &Amt,
SDValue &Lo, SDValue &Hi);		SDValue &Lo, SDValue &Hi);
bool ExpandShiftWithKnownAmountBit(SDNode *N, SDValue &Lo, SDValue &Hi);		bool ExpandShiftWithKnownAmountBit(SDNode *N, SDValue &Lo, SDValue &Hi);
bool ExpandShiftWithUnknownAmountBit(SDNode *N, SDValue &Lo, SDValue &Hi);		bool ExpandShiftWithUnknownAmountBit(SDNode *N, SDValue &Lo, SDValue &Hi);

// Integer Operand Expansion.		// Integer Operand Expansion.
bool ExpandIntegerOperand(SDNode *N, unsigned OperandNo);		bool ExpandIntegerOperand(SDNode *N, unsigned OperandNo);
SDValue ExpandIntOp_BR_CC(SDNode *N);		SDValue ExpandIntOp_BR_CC(SDNode *N);
▲ Show 20 Lines • Show All 495 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,795 Lines • ▼ Show 20 Lines

	SDValue SelectionDAG::getMemIntrinsicNode(unsigned Opcode, const SDLoc &dl,			SDValue SelectionDAG::getMemIntrinsicNode(unsigned Opcode, const SDLoc &dl,
	SDVTList VTList,			SDVTList VTList,
	ArrayRef<SDValue> Ops, EVT MemVT,			ArrayRef<SDValue> Ops, EVT MemVT,
	MachineMemOperand *MMO) {			MachineMemOperand *MMO) {
	assert((Opcode == ISD::INTRINSIC_VOID \|\|			assert((Opcode == ISD::INTRINSIC_VOID \|\|
	Opcode == ISD::INTRINSIC_W_CHAIN \|\|			Opcode == ISD::INTRINSIC_W_CHAIN \|\|
	Opcode == ISD::PREFETCH \|\|			Opcode == ISD::PREFETCH \|\|
				Opcode == ISD::NOSPECULATELOAD \|\|
				Opcode == ISD::NOSPECULATELOAD_NOUPPER \|\|
				Opcode == ISD::NOSPECULATELOAD_NOLOWER \|\|
	Opcode == ISD::LIFETIME_START \|\|			Opcode == ISD::LIFETIME_START \|\|
	Opcode == ISD::LIFETIME_END \|\|			Opcode == ISD::LIFETIME_END \|\|
	((int)Opcode <= std::numeric_limits<int>::max() &&			((int)Opcode <= std::numeric_limits<int>::max() &&
	(int)Opcode >= ISD::FIRST_TARGET_MEMORY_OPCODE)) &&			(int)Opcode >= ISD::FIRST_TARGET_MEMORY_OPCODE)) &&
	"Opcode is not a memory-accessing opcode!");			"Opcode is not a memory-accessing opcode!");

	// Memoize the node unless it returns a flag.			// Memoize the node unless it returns a flag.
	MemIntrinsicSDNode *N;			MemIntrinsicSDNode *N;
	▲ Show 20 Lines • Show All 2,464 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,829 Lines • ▼ Show 20 Lines	case Intrinsic::prefetch: {
DAG.setRoot(DAG.getMemIntrinsicNode(ISD::PREFETCH, sdl,		DAG.setRoot(DAG.getMemIntrinsicNode(ISD::PREFETCH, sdl,
DAG.getVTList(MVT::Other), Ops,		DAG.getVTList(MVT::Other), Ops,
EVT::getIntegerVT(*Context, 8),		EVT::getIntegerVT(*Context, 8),
MachinePointerInfo(I.getArgOperand(0)),		MachinePointerInfo(I.getArgOperand(0)),
0, /* align */		0, /* align */
Flags));		Flags));
return nullptr;		return nullptr;
}		}
		case Intrinsic::nospeculateload: {
		SDValue Chain = getRoot();
		EVT LoadVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
		SDVTList NodeTys = DAG.getVTList(LoadVT, MVT::Other);

		SDValue Ops[] {
		Chain,
		getValue(I.getArgOperand(0)), // Ptr
		getValue(I.getArgOperand(1)), // LowerBound
		getValue(I.getArgOperand(2)), // UpperBound
		getValue(I.getArgOperand(3)), // Failval
		getValue(I.getArgOperand(4)), // CmpPtr
		};

		SDValue Result = DAG.getMemIntrinsicNode(
		ISD::NOSPECULATELOAD, sdl, NodeTys, Ops, LoadVT,
		MachinePointerInfo(I.getArgOperand(0)), 0,
		MachineMemOperand::MOLoad \| MachineMemOperand::MOVolatile);

		assert(Result.getNode()->getNumValues() == 2);
		SDValue OutChain = Result.getValue(1);
		DAG.setRoot(OutChain);
		SDValue LoadedVal = Result.getValue(0);
		setValue(&I, LoadedVal);

		return nullptr;
		}
		case Intrinsic::nospeculateload_nolower:
		case Intrinsic::nospeculateload_noupper: {
		SDValue Chain = getRoot();
		EVT LoadVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
		SDVTList NodeTys = DAG.getVTList(LoadVT, MVT::Other);

		SDValue Ops[] = {
		Chain,
		getValue(I.getArgOperand(0)), // Ptr
		getValue(I.getArgOperand(1)), // Bound
		getValue(I.getArgOperand(2)), // FailVal
		getValue(I.getArgOperand(3)) // CmpPtr
		};

		unsigned Opcode = Intrinsic == Intrinsic::nospeculateload_nolower
		? ISD::NOSPECULATELOAD_NOLOWER
		: ISD::NOSPECULATELOAD_NOUPPER;
		SDValue Result = DAG.getMemIntrinsicNode(
		Opcode, sdl, NodeTys, Ops, LoadVT,
		MachinePointerInfo(I.getArgOperand(0)), 0,
		MachineMemOperand::MOLoad \| MachineMemOperand::MOVolatile);

		assert(Result.getNode()->getNumValues() == 2);
		SDValue OutChain = Result.getValue(1);
		DAG.setRoot(OutChain);
		SDValue LoadedVal = Result.getValue(0);
		setValue(&I, LoadedVal);

		return nullptr;
		}
case Intrinsic::lifetime_start:		case Intrinsic::lifetime_start:
case Intrinsic::lifetime_end: {		case Intrinsic::lifetime_end: {
bool IsStart = (Intrinsic == Intrinsic::lifetime_start);		bool IsStart = (Intrinsic == Intrinsic::lifetime_start);
// Stack coloring is not enabled in O0, discard region information.		// Stack coloring is not enabled in O0, discard region information.
if (TM.getOptLevel() == CodeGenOpt::None)		if (TM.getOptLevel() == CodeGenOpt::None)
return nullptr;		return nullptr;

SmallVector<Value *, 4> Allocas;		SmallVector<Value *, 4> Allocas;
▲ Show 20 Lines • Show All 4,184 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	if (G) {
if (Name) return Name;		if (Name) return Name;
return "<<Unknown Target Node #" + utostr(getOpcode()) + ">>";		return "<<Unknown Target Node #" + utostr(getOpcode()) + ">>";
}		}
return "<<Unknown Node #" + utostr(getOpcode()) + ">>";		return "<<Unknown Node #" + utostr(getOpcode()) + ">>";

#ifndef NDEBUG		#ifndef NDEBUG
case ISD::DELETED_NODE: return "<<Deleted Node!>>";		case ISD::DELETED_NODE: return "<<Deleted Node!>>";
#endif		#endif
		case ISD::NOSPECULATELOAD: return "NoSpeculateLoad";
		case ISD::NOSPECULATELOAD_NOLOWER: return "NoSpeculateLoadNoLower";
		case ISD::NOSPECULATELOAD_NOUPPER: return "NoSpeculateLoadNoUpper";
case ISD::PREFETCH: return "Prefetch";		case ISD::PREFETCH: return "Prefetch";
case ISD::ATOMIC_FENCE: return "AtomicFence";		case ISD::ATOMIC_FENCE: return "AtomicFence";
case ISD::ATOMIC_CMP_SWAP: return "AtomicCmpSwap";		case ISD::ATOMIC_CMP_SWAP: return "AtomicCmpSwap";
case ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS: return "AtomicCmpSwapWithSuccess";		case ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS: return "AtomicCmpSwapWithSuccess";
case ISD::ATOMIC_SWAP: return "AtomicSwap";		case ISD::ATOMIC_SWAP: return "AtomicSwap";
case ISD::ATOMIC_LOAD_ADD: return "AtomicLoadAdd";		case ISD::ATOMIC_LOAD_ADD: return "AtomicLoadAdd";
case ISD::ATOMIC_LOAD_SUB: return "AtomicLoadSub";		case ISD::ATOMIC_LOAD_SUB: return "AtomicLoadSub";
case ISD::ATOMIC_LOAD_AND: return "AtomicLoadAnd";		case ISD::ATOMIC_LOAD_AND: return "AtomicLoadAnd";
▲ Show 20 Lines • Show All 695 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64AsmPrinter.cpp

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	public:
void LowerSTACKMAP(MCStreamer &OutStreamer, StackMaps &SM,		void LowerSTACKMAP(MCStreamer &OutStreamer, StackMaps &SM,
const MachineInstr &MI);		const MachineInstr &MI);
void LowerPATCHPOINT(MCStreamer &OutStreamer, StackMaps &SM,		void LowerPATCHPOINT(MCStreamer &OutStreamer, StackMaps &SM,
const MachineInstr &MI);		const MachineInstr &MI);

void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI);		void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI);
void LowerPATCHABLE_FUNCTION_EXIT(const MachineInstr &MI);		void LowerPATCHABLE_FUNCTION_EXIT(const MachineInstr &MI);
void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI);		void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI);
		void LowerNOSPECULATELOAD(const MachineInstr *MI, unsigned LoadOpc,
		bool NoLower, bool NoUpper, bool RegPair,
		bool XRegs);

void EmitSled(const MachineInstr &MI, SledKind Kind);		void EmitSled(const MachineInstr &MI, SledKind Kind);

/// \brief tblgen'erated driver function for lowering simple MI->MC		/// \brief tblgen'erated driver function for lowering simple MI->MC
/// pseudo instructions.		/// pseudo instructions.
bool emitPseudoExpansionLowering(MCStreamer &OutStreamer,		bool emitPseudoExpansionLowering(MCStreamer &OutStreamer,
const MachineInstr *MI);		const MachineInstr *MI);

▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	void AArch64AsmPrinter::EmitSled(const MachineInstr &MI, SledKind Kind)

for (int8_t I = 0; I < NoopsInSledCount; I++)		for (int8_t I = 0; I < NoopsInSledCount; I++)
EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::HINT).addImm(0));		EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::HINT).addImm(0));

OutStreamer->EmitLabel(Target);		OutStreamer->EmitLabel(Target);
recordSled(CurSled, MI, Kind);		recordSled(CurSled, MI, Kind);
}		}

		void AArch64AsmPrinter::LowerNOSPECULATELOAD(const MachineInstr *MI,
		unsigned LoadOpc, bool NoLower,
		bool NoUpper, bool RegPair,
		bool XRegs) {
		unsigned Op = 0;
		unsigned Dst = MI->getOperand(Op++).getReg();
		unsigned DstHi = RegPair ? MI->getOperand(Op++).getReg() : 0;
		unsigned Ptr = MI->getOperand(Op++).getReg();
		unsigned LowerBound = NoLower ? 0 : MI->getOperand(Op++).getReg();
		unsigned UpperBound = NoUpper ? 0 : MI->getOperand(Op++).getReg();
		unsigned FailVal = MI->getOperand(Op++).getReg();
		unsigned FailValHi = RegPair ? MI->getOperand(Op++).getReg() : 0;
		unsigned CmpPtr = MI->getOperand(Op++).getReg();
		MCSymbol *FailLabel = OutContext.createTempSymbol();

		unsigned FirstBoundReg = NoLower ? UpperBound : LowerBound;
		unsigned BranchCond = AArch64CC::HS, CSelCond = AArch64CC::LO;
		if (NoUpper)
		std::swap(BranchCond, CSelCond);
		if (RegPair && !STI->isLittleEndian()) {
		std::swap(Dst, DstHi);
		std::swap(FailVal, FailValHi);
		}

		// CMP cmpptr, lower
		EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::SUBSXrs)
		.addReg(AArch64::XZR)
		.addReg(CmpPtr)
		.addReg(FirstBoundReg)
		.addImm(AArch64_AM::getShiftValue(0)));

		// CCMP cmpptr, upper, 2, hs
		if (!NoLower && !NoUpper)
		EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::CCMPXr)
		.addReg(CmpPtr)
		.addReg(UpperBound)
		.addImm(2)
		.addImm(AArch64CC::HS));

		// B.HS FAIL
		EmitToStreamer(*OutStreamer,
		MCInstBuilder(AArch64::Bcc)
		.addImm(BranchCond)
		.addExpr(MCSymbolRefExpr::create(FailLabel, OutContext)));

		// LDR{P\|\|H\|B} dst, [ptr]
		MCInstBuilder LDRBuilder(LoadOpc);
		LDRBuilder.addReg(Dst);
		if (RegPair) LDRBuilder.addReg(DstHi);
		LDRBuilder.addReg(Ptr).addImm(0);
		EmitToStreamer(*OutStreamer, LDRBuilder); // Offset

		// Label
		OutStreamer->EmitLabel(FailLabel);

		// CSEL dst, dst, failval, LO
		unsigned CselOpcode = XRegs ? AArch64::CSELXr : AArch64::CSELWr;
		EmitToStreamer(
		*OutStreamer,
		MCInstBuilder(CselOpcode)
		.addReg(Dst)
		.addReg(Dst)
		.addReg(FailVal)
		.addImm(CSelCond));
		if (RegPair)
		EmitToStreamer(
		*OutStreamer,
		MCInstBuilder(CselOpcode)
		.addReg(DstHi)
		.addReg(DstHi)
		.addReg(FailValHi)
		.addImm(CSelCond));

		// HINT #0x14 // CSDB
		EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::HINT).addImm(0x14));
		}

void AArch64AsmPrinter::EmitEndOfAsmFile(Module &M) {		void AArch64AsmPrinter::EmitEndOfAsmFile(Module &M) {
const Triple &TT = TM.getTargetTriple();		const Triple &TT = TM.getTargetTriple();
if (TT.isOSBinFormatMachO()) {		if (TT.isOSBinFormatMachO()) {
// Funny Darwin hack: This flag tells the linker that no global symbols		// Funny Darwin hack: This flag tells the linker that no global symbols
// contain code that falls through to other global symbols (e.g. the obvious		// contain code that falls through to other global symbols (e.g. the obvious
// implementation of multiple entry points). If this doesn't occur, the		// implementation of multiple entry points). If this doesn't occur, the
// linker can safely perform dead code stripping. Since LLVM never		// linker can safely perform dead code stripping. Since LLVM never
// generates code that does this, it is always safe to set.		// generates code that does this, it is always safe to set.
▲ Show 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	case AArch64::TLSDESC_CALLSEQ: {
MCInst Blr;		MCInst Blr;
Blr.setOpcode(AArch64::BLR);		Blr.setOpcode(AArch64::BLR);
Blr.addOperand(MCOperand::createReg(AArch64::X1));		Blr.addOperand(MCOperand::createReg(AArch64::X1));
EmitToStreamer(*OutStreamer, Blr);		EmitToStreamer(*OutStreamer, Blr);

return;		return;
}		}

		case AArch64::NoSpeculateLoadXPair_both:
		return LowerNOSPECULATELOAD(MI, AArch64::LDPXi, false, false, true, true);
		case AArch64::NoSpeculateLoadX_both:
		return LowerNOSPECULATELOAD(MI, AArch64::LDRXui, false, false, false, true);
		case AArch64::NoSpeculateLoadW_both:
		return LowerNOSPECULATELOAD(MI, AArch64::LDRWui, false, false, false, false);
		case AArch64::NoSpeculateLoadH_both:
		return LowerNOSPECULATELOAD(MI, AArch64::LDRHHui, false, false, false, false);
		case AArch64::NoSpeculateLoadB_both:
		return LowerNOSPECULATELOAD(MI, AArch64::LDRBBui, false, false, false, false);

		case AArch64::NoSpeculateLoadXPair_nolower:
		return LowerNOSPECULATELOAD(MI, AArch64::LDPXi, true, false, true, true);
		case AArch64::NoSpeculateLoadX_nolower:
		return LowerNOSPECULATELOAD(MI, AArch64::LDRXui, true, false, false, true);
		case AArch64::NoSpeculateLoadW_nolower:
		return LowerNOSPECULATELOAD(MI, AArch64::LDRWui, true, false, false, false);
		case AArch64::NoSpeculateLoadH_nolower:
		return LowerNOSPECULATELOAD(MI, AArch64::LDRHHui, true, false, false, false);
		case AArch64::NoSpeculateLoadB_nolower:
		return LowerNOSPECULATELOAD(MI, AArch64::LDRBBui, true, false, false, false);

		case AArch64::NoSpeculateLoadXPair_noupper:
		return LowerNOSPECULATELOAD(MI, AArch64::LDPXi, false, true, true, true);
		case AArch64::NoSpeculateLoadX_noupper:
		return LowerNOSPECULATELOAD(MI, AArch64::LDRXui, false, true, false, true);
		case AArch64::NoSpeculateLoadW_noupper:
		return LowerNOSPECULATELOAD(MI, AArch64::LDRWui, false, true, false, false);
		case AArch64::NoSpeculateLoadH_noupper:
		return LowerNOSPECULATELOAD(MI, AArch64::LDRHHui, false, true, false, false);
		case AArch64::NoSpeculateLoadB_noupper:
		return LowerNOSPECULATELOAD(MI, AArch64::LDRBBui, false, true, false, false);

case AArch64::FMOVH0:		case AArch64::FMOVH0:
case AArch64::FMOVS0:		case AArch64::FMOVS0:
case AArch64::FMOVD0:		case AArch64::FMOVD0:
EmitFMov0(*MI);		EmitFMov0(*MI);
return;		return;

case TargetOpcode::STACKMAP:		case TargetOpcode::STACKMAP:
return LowerSTACKMAP(OutStreamer, SM, MI);		return LowerSTACKMAP(OutStreamer, SM, MI);
Show All 29 Lines

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 457 Lines • ▼ Show 20 Lines	if (Subtarget->hasFullFP16()) {
setOperationAction(ISD::FMINNUM, MVT::f16, Legal);		setOperationAction(ISD::FMINNUM, MVT::f16, Legal);
setOperationAction(ISD::FMAXNUM, MVT::f16, Legal);		setOperationAction(ISD::FMAXNUM, MVT::f16, Legal);
setOperationAction(ISD::FMINNAN, MVT::f16, Legal);		setOperationAction(ISD::FMINNAN, MVT::f16, Legal);
setOperationAction(ISD::FMAXNAN, MVT::f16, Legal);		setOperationAction(ISD::FMAXNAN, MVT::f16, Legal);
}		}

setOperationAction(ISD::PREFETCH, MVT::Other, Custom);		setOperationAction(ISD::PREFETCH, MVT::Other, Custom);

		setOperationAction(ISD::NOSPECULATELOAD, MVT::i128, Custom);
		setOperationAction(ISD::NOSPECULATELOAD_NOLOWER, MVT::i128, Custom);
		setOperationAction(ISD::NOSPECULATELOAD_NOUPPER, MVT::i128, Custom);

setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i128, Custom);		setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i128, Custom);

// Lower READCYCLECOUNTER using an mrs from PMCCNTR_EL0.		// Lower READCYCLECOUNTER using an mrs from PMCCNTR_EL0.
// This requires the Performance Monitors extension.		// This requires the Performance Monitors extension.
if (Subtarget->hasPerfMon())		if (Subtarget->hasPerfMon())
setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);		setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);

if (getLibcallName(RTLIB::SINCOS_STRET_F32) != nullptr &&		if (getLibcallName(RTLIB::SINCOS_STRET_F32) != nullptr &&
▲ Show 20 Lines • Show All 10,184 Lines • ▼ Show 20 Lines	static void ReplaceCMP_SWAP_128Results(SDNode *N,
MemOp[0] = cast<MemSDNode>(N)->getMemOperand();		MemOp[0] = cast<MemSDNode>(N)->getMemOperand();
cast<MachineSDNode>(CmpSwap)->setMemRefs(MemOp, MemOp + 1);		cast<MachineSDNode>(CmpSwap)->setMemRefs(MemOp, MemOp + 1);

Results.push_back(SDValue(CmpSwap, 0));		Results.push_back(SDValue(CmpSwap, 0));
Results.push_back(SDValue(CmpSwap, 1));		Results.push_back(SDValue(CmpSwap, 1));
Results.push_back(SDValue(CmpSwap, 3));		Results.push_back(SDValue(CmpSwap, 3));
}		}

		static void ReplaceNOSPECULATELOADResults(SDNode *N,
		SmallVectorImpl<SDValue> &Results,
		SelectionDAG &DAG) {
		assert(N->getValueType(0) == MVT::i128 &&
		"NOSPECULATELOAD on types less than 128 should be legal");
		auto FailVal = splitInt128(N->getOperand(4), DAG);
		SDValue Ops[] = {
		N->getOperand(1), // Ptr
		N->getOperand(2), // LowerBound
		N->getOperand(3), // UpperBound
		FailVal.first,
		FailVal.second,
		N->getOperand(5), // CmpPtr
		N->getOperand(0), // Chain
		};
		SDVTList NodeTys = DAG.getVTList(MVT::i64, MVT::i64, MVT::Other);
		MachineSDNode *NewNode = DAG.getMachineNode(
		AArch64::NoSpeculateLoadXPair_both, SDLoc(N), NodeTys, Ops);

		MachineFunction &MF = DAG.getMachineFunction();
		MachineSDNode::mmo_iterator MemOp = MF.allocateMemRefsArray(1);
		MemOp[0] = cast<MemSDNode>(N)->getMemOperand();
		NewNode->setMemRefs(MemOp, MemOp + 1);

		Results.push_back(DAG.getNode(ISD::BUILD_PAIR, SDLoc(N),
		MVT::i128, SDValue(NewNode, 0),
		SDValue(NewNode, 1)));
		Results.push_back(SDValue(NewNode, 2)); // Chain
		}

		static void ReplaceNOSPECULATELOADOneCheckResults(SDNode *N,
		SmallVectorImpl<SDValue> &Results,
		SelectionDAG &DAG) {
		assert(N->getValueType(0) == MVT::i128 &&
		"NOSPECULATELOAD on types less than 128 should be legal");
		auto FailVal = splitInt128(N->getOperand(3), DAG);
		SDValue Ops[] = {
		N->getOperand(1), // Ptr
		N->getOperand(2), // Bound
		FailVal.first,
		FailVal.second,
		N->getOperand(4), // CmpPtr
		N->getOperand(0), // Chain
		};
		SDVTList NodeTys = DAG.getVTList(MVT::i64, MVT::i64, MVT::Other);
		unsigned Opcode = N->getOpcode() == ISD::NOSPECULATELOAD_NOLOWER
		? AArch64::NoSpeculateLoadXPair_nolower
		: AArch64::NoSpeculateLoadXPair_noupper;
		MachineSDNode *NewNode = DAG.getMachineNode(Opcode, SDLoc(N), NodeTys, Ops);

		MachineFunction &MF = DAG.getMachineFunction();
		MachineSDNode::mmo_iterator MemOp = MF.allocateMemRefsArray(1);
		MemOp[0] = cast<MemSDNode>(N)->getMemOperand();
		NewNode->setMemRefs(MemOp, MemOp + 1);

		Results.push_back(DAG.getNode(ISD::BUILD_PAIR, SDLoc(N),
		MVT::i128, SDValue(NewNode, 0),
		SDValue(NewNode, 1)));
		Results.push_back(SDValue(NewNode, 2)); // Chain
		}

void AArch64TargetLowering::ReplaceNodeResults(		void AArch64TargetLowering::ReplaceNodeResults(
SDNode *N, SmallVectorImpl<SDValue> &Results, SelectionDAG &DAG) const {		SDNode *N, SmallVectorImpl<SDValue> &Results, SelectionDAG &DAG) const {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
llvm_unreachable("Don't know how to custom expand this");		llvm_unreachable("Don't know how to custom expand this");
case ISD::BITCAST:		case ISD::BITCAST:
ReplaceBITCASTResults(N, Results, DAG);		ReplaceBITCASTResults(N, Results, DAG);
return;		return;
Show All 26 Lines	void AArch64TargetLowering::ReplaceNodeResults(
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
assert(N->getValueType(0) == MVT::i128 && "unexpected illegal conversion");		assert(N->getValueType(0) == MVT::i128 && "unexpected illegal conversion");
// Let normal code take care of it by not adding anything to Results.		// Let normal code take care of it by not adding anything to Results.
return;		return;
case ISD::ATOMIC_CMP_SWAP:		case ISD::ATOMIC_CMP_SWAP:
ReplaceCMP_SWAP_128Results(N, Results, DAG);		ReplaceCMP_SWAP_128Results(N, Results, DAG);
return;		return;
		case ISD::NOSPECULATELOAD:
		ReplaceNOSPECULATELOADResults(N, Results, DAG);
		return;
		case ISD::NOSPECULATELOAD_NOLOWER:
		case ISD::NOSPECULATELOAD_NOUPPER:
		ReplaceNOSPECULATELOADOneCheckResults(N, Results, DAG);
		return;
}		}
}		}

bool AArch64TargetLowering::useLoadStackGuardNode() const {		bool AArch64TargetLowering::useLoadStackGuardNode() const {
if (Subtarget->isTargetAndroid() \|\| Subtarget->isTargetFuchsia())		if (Subtarget->isTargetAndroid() \|\| Subtarget->isTargetFuchsia())
return TargetLowering::useLoadStackGuardNode();		return TargetLowering::useLoadStackGuardNode();
return true;		return true;
}		}
▲ Show 20 Lines • Show All 261 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64InstrInfo.cpp

Show First 20 Lines • Show All 3,004 Lines • ▼ Show 20 Lines	int llvm::isAArch64FrameOffsetLegal(const MachineInstr &MI, int &Offset,
case AArch64::LD1Threev1d:		case AArch64::LD1Threev1d:
case AArch64::LD1Fourv1d:		case AArch64::LD1Fourv1d:
case AArch64::ST1Twov2d:		case AArch64::ST1Twov2d:
case AArch64::ST1Threev2d:		case AArch64::ST1Threev2d:
case AArch64::ST1Fourv2d:		case AArch64::ST1Fourv2d:
case AArch64::ST1Twov1d:		case AArch64::ST1Twov1d:
case AArch64::ST1Threev1d:		case AArch64::ST1Threev1d:
case AArch64::ST1Fourv1d:		case AArch64::ST1Fourv1d:
		case AArch64::NoSpeculateLoadX_both:
		case AArch64::NoSpeculateLoadW_both:
		case AArch64::NoSpeculateLoadH_both:
		case AArch64::NoSpeculateLoadB_both:
		case AArch64::NoSpeculateLoadX_nolower:
		case AArch64::NoSpeculateLoadW_nolower:
		case AArch64::NoSpeculateLoadH_nolower:
		case AArch64::NoSpeculateLoadB_nolower:
		case AArch64::NoSpeculateLoadX_noupper:
		case AArch64::NoSpeculateLoadW_noupper:
		case AArch64::NoSpeculateLoadH_noupper:
		case AArch64::NoSpeculateLoadB_noupper:
return AArch64FrameOffsetCannotUpdate;		return AArch64FrameOffsetCannotUpdate;
case AArch64::PRFMui:		case AArch64::PRFMui:
Scale = 8;		Scale = 8;
UnscaledOp = AArch64::PRFUMi;		UnscaledOp = AArch64::PRFUMi;
break;		break;
case AArch64::LDRXui:		case AArch64::LDRXui:
Scale = 8;		Scale = 8;
UnscaledOp = AArch64::LDURXi;		UnscaledOp = AArch64::LDURXi;
▲ Show 20 Lines • Show All 2,045 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,373 Lines • ▼ Show 20 Lines
	def TLSDESC_CALLSEQ			def TLSDESC_CALLSEQ
	: Pseudo<(outs), (ins i64imm:$sym),			: Pseudo<(outs), (ins i64imm:$sym),
	[(AArch64tlsdesc_callseq tglobaltlsaddr:$sym)]>,			[(AArch64tlsdesc_callseq tglobaltlsaddr:$sym)]>,
	Sched<[WriteI, WriteLD, WriteI, WriteBrReg]>;			Sched<[WriteI, WriteLD, WriteI, WriteBrReg]>;
	def : Pat<(AArch64tlsdesc_callseq texternalsym:$sym),			def : Pat<(AArch64tlsdesc_callseq texternalsym:$sym),
	(TLSDESC_CALLSEQ texternalsym:$sym)>;			(TLSDESC_CALLSEQ texternalsym:$sym)>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
				// Speculation barrier intrinsics
				//===----------------------------------------------------------------------===//
				multiclass NoSpeculateLoad<RegisterClass ValueClass, code TypeCheck> {
				def _both_frag : PatFrag<(ops node:$ptr, node:$lower_bound,
				node:$upper_bound, node:$failval,
				node:$cmpptr),
				(nospeculateload node:$ptr, node:$lower_bound,
				node:$upper_bound, node:$failval,
				node:$cmpptr), TypeCheck>;
				def _nolower_frag : PatFrag<(ops node:$ptr,
				node:$upper_bound, node:$failval,
				node:$cmpptr),
				(nospeculateload_nolower node:$ptr,
				node:$upper_bound, node:$failval,
				node:$cmpptr), TypeCheck>;
				def _noupper_frag : PatFrag<(ops node:$ptr, node:$lower_bound,
				node:$failval, node:$cmpptr),
				(nospeculateload_noupper node:$ptr, node:$lower_bound,
				node:$failval,
				node:$cmpptr), TypeCheck>;

				let hasSideEffects = 1, isCodeGenOnly = 1, Defs = [NZCV], mayLoad = 1,
				Constraints = "@earlyclobber $dst" in {
				let Size = 24 in
				def _both
				: Pseudo<(outs ValueClass:$dst),
				(ins GPR64sp:$ptr, GPR64:$lower_bound,
				GPR64:$upper_bound, ValueClass:$failval, GPR64:$cmpptr),
				[(set ValueClass:$dst,
				(!cast<SDNode>(NAME # "_both_frag") GPR64sp:$ptr,
				GPR64:$lower_bound, GPR64:$upper_bound,
				ValueClass:$failval, GPR64:$cmpptr))]>,
				Sched<[]>;
				let Size = 20 in
				def _nolower
				: Pseudo<(outs ValueClass:$dst),
				(ins GPR64sp:$ptr, GPR64:$upper_bound,
				ValueClass:$failval, GPR64:$cmpptr),
				[(set ValueClass:$dst,
				(!cast<SDNode>(NAME # "_nolower_frag") GPR64sp:$ptr,
				GPR64:$upper_bound, ValueClass:$failval,
				GPR64:$cmpptr))]>,
				Sched<[]>;
				let Size = 20 in
				def _noupper
				: Pseudo<(outs ValueClass:$dst),
				(ins GPR64sp:$ptr, GPR64:$lower_bound,
				ValueClass:$failval, GPR64:$cmpptr),
				[(set ValueClass:$dst,
				(!cast<SDNode>(NAME # "_noupper_frag") GPR64sp:$ptr,
				GPR64:$lower_bound, ValueClass:$failval,
				GPR64:$cmpptr))]>,
				Sched<[]>;
				}
				}

				defm NoSpeculateLoadX: NoSpeculateLoad<GPR64,
				[{return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i64;}]>;
				defm NoSpeculateLoadW: NoSpeculateLoad<GPR32,
				[{return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i32;}]>;
				defm NoSpeculateLoadH: NoSpeculateLoad<GPR32,
				[{return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i16;}]>;
				defm NoSpeculateLoadB: NoSpeculateLoad<GPR32,
				[{return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i8;}]>;

				let hasSideEffects = 1, isCodeGenOnly = 1, Defs = [NZCV],
				mayLoad = 1, Constraints = "@earlyclobber $dstlo,@earlyclobber $dsthi" in {
				let Size = 28 in
				def NoSpeculateLoadXPair_both
				: Pseudo<(outs GPR64:$dstlo, GPR64:$dsthi),
				(ins GPR64sp:$ptr, GPR64:$lower_bound,
				GPR64:$upper_bound, GPR64:$failvallo,
				GPR64:$failvalhi, GPR64:$cmpptr),
				[]>,
				Sched<[]>;
				let Size = 24 in
				def NoSpeculateLoadXPair_nolower
				: Pseudo<(outs GPR64:$dstlo, GPR64:$dsthi),
				(ins GPR64sp:$ptr,
				GPR64:$upper_bound, GPR64:$failvallo,
				GPR64:$failvalhi, GPR64:$cmpptr),
				[]>,
				Sched<[]>;
				let Size = 24 in
				def NoSpeculateLoadXPair_noupper
				: Pseudo<(outs GPR64:$dstlo, GPR64:$dsthi),
				(ins GPR64sp:$ptr, GPR64:$lower_bound,
				GPR64:$failvallo,
				GPR64:$failvalhi, GPR64:$cmpptr),
				[]>,
				Sched<[]>;
				}

				//===----------------------------------------------------------------------===//
	// Conditional branch (immediate) instruction.			// Conditional branch (immediate) instruction.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	def Bcc : BranchCond;			def Bcc : BranchCond;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Compare-and-branch instructions.			// Compare-and-branch instructions.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	defm CBZ : CmpBranch<0, "cbz", AArch64cbz>;			defm CBZ : CmpBranch<0, "cbz", AArch64cbz>;
	▲ Show 20 Lines • Show All 4,889 Lines • Show Last 20 Lines

lib/Target/ARM/ARMAsmPrinter.h

Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	bool PrintAsmMemoryOperand(const MachineInstr *MI, unsigned OpNum,
raw_ostream &O) override;		raw_ostream &O) override;

void emitInlineAsmEnd(const MCSubtargetInfo &StartInfo,		void emitInlineAsmEnd(const MCSubtargetInfo &StartInfo,
const MCSubtargetInfo *EndInfo) const override;		const MCSubtargetInfo *EndInfo) const override;

void EmitJumpTableAddrs(const MachineInstr *MI);		void EmitJumpTableAddrs(const MachineInstr *MI);
void EmitJumpTableInsts(const MachineInstr *MI);		void EmitJumpTableInsts(const MachineInstr *MI);
void EmitJumpTableTBInst(const MachineInstr *MI, unsigned OffsetWidth);		void EmitJumpTableTBInst(const MachineInstr *MI, unsigned OffsetWidth);
		void EmitNOSPECULATELOAD(const MachineInstr *MI, bool NoLower, bool NoUpper,
		int Width);
		void EmitThumbNOSPECULATELOAD(const MachineInstr *MI, bool NoLower,
		bool NoUpper, int Width);
void EmitInstruction(const MachineInstr *MI) override;		void EmitInstruction(const MachineInstr *MI) override;
bool runOnMachineFunction(MachineFunction &F) override;		bool runOnMachineFunction(MachineFunction &F) override;

void EmitConstantPool() override {		void EmitConstantPool() override {
// we emit constant pools customly!		// we emit constant pools customly!
}		}
void EmitFunctionBodyEnd() override;		void EmitFunctionBodyEnd() override;
void EmitFunctionEntryLabel() override;		void EmitFunctionEntryLabel() override;
▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

lib/Target/ARM/ARMAsmPrinter.cpp

Show First 20 Lines • Show All 1,185 Lines • ▼ Show 20 Lines	if (MI->mayStore()) {
}		}
else {		else {
MI->print(errs());		MI->print(errs());
llvm_unreachable("Unsupported opcode for unwinding information");		llvm_unreachable("Unsupported opcode for unwinding information");
}		}
}		}
}		}

		void ARMAsmPrinter::EmitNOSPECULATELOAD(const MachineInstr *MI, bool NoLower,
		bool NoUpper, int Width) {
		MCTargetStreamer &TS = *OutStreamer->getTargetStreamer();
		ARMTargetStreamer &ATS = static_cast<ARMTargetStreamer &>(TS);
		const TargetRegisterInfo *TRI = Subtarget->getRegisterInfo();

		unsigned Op = 0;
		unsigned Dst = MI->getOperand(Op++).getReg();
		unsigned Ptr = MI->getOperand(Op++).getReg();
		unsigned LowerBound = NoLower ? 0 : MI->getOperand(Op++).getReg();
		unsigned UpperBound = NoUpper ? 0 : MI->getOperand(Op++).getReg();
		unsigned FailVal = MI->getOperand(Op++).getReg();
		unsigned FailValHi = Width == 64 ? MI->getOperand(Op++).getReg() : 0;
		unsigned CmpPtr = MI->getOperand(Op++).getReg();

		unsigned FirstBoundCheck = NoLower ? UpperBound : LowerBound;
		unsigned LoadCond, FailCond;
		if (NoLower) {
		LoadCond = ARMCC::LO;
		FailCond = ARMCC::HS;
		} else if (NoUpper) {
		LoadCond = ARMCC::HS;
		FailCond = ARMCC::LO;
		} else {
		LoadCond = ARMCC::HI;
		FailCond = ARMCC::LS;
		}
		unsigned DstHi = 0;
		if (Width == 64) {
		DstHi = TRI->getSubReg(Dst, ARM::gsub_1);
		Dst = TRI->getSubReg(Dst, ARM::gsub_0);
		if (!Subtarget->isLittle())
		std::swap(FailVal, FailValHi);
		}

		// CMP cmpptr, lower
		EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::CMPrr)
		.addReg(CmpPtr)
		.addReg(FirstBoundCheck)
		.addImm(ARMCC::AL) // Predicate
		.addReg(0)); // CPSR in

		// CMPHS upper, cmpptr
		if (!NoLower && !NoUpper) {
		EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::CMPrr)
		.addReg(UpperBound)
		.addReg(CmpPtr)
		.addImm(ARMCC::HS) // Predicate
		.addReg(ARM::CPSR)); // CPSR in
		}

		// LDxHI val, [ptr]
		switch (Width) {
		default:
		llvm_unreachable("unexpected load width");
		case 8:
		EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::LDRBi12)
		.addReg(Dst)
		.addReg(Ptr)
		.addImm(0) // Offset
		.addImm(LoadCond) // Predicate
		.addReg(ARM::CPSR)); // CPSR in
		break;
		case 16:
		EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::LDRH)
		.addReg(Dst)
		.addReg(Ptr)
		.addReg(0) // Offset register
		.addImm(0) // Offset immediate
		.addImm(LoadCond) // Predicate
		.addReg(ARM::CPSR)); // CPSR in
		break;
		case 32:
		EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::LDRi12)
		.addReg(Dst)
		.addReg(Ptr)
		.addImm(0) // Offset
		.addImm(LoadCond) // Predicate
		.addReg(ARM::CPSR)); // CPSR in
		break;
		case 64: {
		EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::LDRD)
		.addReg(Dst)
		.addReg(DstHi)
		.addReg(Ptr)
		.addReg(0) // Offset reg
		.addImm(0) // Offset imm
		.addImm(LoadCond) // Predicate
		.addReg(ARM::CPSR)); // CPSR in
		break;
		}
		}

		// MOVLS val, failval
		EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::MOVr)
		.addReg(Dst)
		.addReg(FailVal)
		.addImm(FailCond) // Predicate
		.addReg(ARM::CPSR) // CPSR in
		.addReg(0)); // CPSR out

		if (Width == 64)
		EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::MOVr)
		.addReg(DstHi)
		.addReg(FailValHi)
		.addImm(FailCond) // Predicate
		.addReg(ARM::CPSR) // CPSR in
		.addReg(0)); // CPSR out

		// CSDB
		uint32_t Csdb = 0xe320f014UL;
		OutStreamer->AddComment("csdb");
		ATS.emitInst(Csdb);
		}

		void ARMAsmPrinter::EmitThumbNOSPECULATELOAD(const MachineInstr *MI,
		bool NoLower, bool NoUpper,
		int Width) {
		MCTargetStreamer &TS = *OutStreamer->getTargetStreamer();
		ARMTargetStreamer &ATS = static_cast<ARMTargetStreamer &>(TS);

		unsigned Op = 0;
		unsigned Dst = MI->getOperand(Op++).getReg();
		unsigned DstHi = Width == 64 ? MI->getOperand(Op++).getReg() : 0;
		unsigned Ptr = MI->getOperand(Op++).getReg();
		unsigned LowerBound = NoLower ? 0 : MI->getOperand(Op++).getReg();
		unsigned UpperBound = NoUpper ? 0 : MI->getOperand(Op++).getReg();
		unsigned FailVal = MI->getOperand(Op++).getReg();
		unsigned FailValHi = Width == 64 ? MI->getOperand(Op++).getReg() : 0;
		unsigned CmpPtr = MI->getOperand(Op++).getReg();

		unsigned FirstBoundCheck = NoLower ? UpperBound : LowerBound;
		unsigned LoadCond, FailCond;
		if (NoLower) {
		LoadCond = ARMCC::LO;
		FailCond = ARMCC::HS;
		} else if (NoUpper) {
		LoadCond = ARMCC::HS;
		FailCond = ARMCC::LO;
		} else {
		LoadCond = ARMCC::HI;
		FailCond = ARMCC::LS;
		}
		if (Width == 64 && !Subtarget->isLittle()) {
		std::swap(Dst, DstHi);
		std::swap(FailVal, FailValHi);
		}

		// CMP cmpptr, lower
		EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::tCMPr)
		.addReg(CmpPtr)
		.addReg(FirstBoundCheck)
		.addImm(ARMCC::AL) // Predicate
		.addReg(0)); // CPSR in

		if (!NoLower && !NoUpper) {
		// IT HS
		EmitToStreamer(
		*OutStreamer,
		MCInstBuilder(ARM::t2IT).addImm(ARMCC::HS).addImm(8)); // Mask: T

		// CMPHS upper, cmpptr
		EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::tCMPr)
		.addReg(UpperBound)
		.addReg(CmpPtr)
		.addImm(ARMCC::HS) // Predicate
		.addReg(ARM::CPSR)); // CPSR in
		}

		// IT HI
		EmitToStreamer(
		*OutStreamer,
		MCInstBuilder(ARM::t2IT).addImm(LoadCond).addImm(8)); // Mask: T

		// LDxHI dst, [ptr]
		unsigned LdrOpc;
		switch (Width) {
		default:
		llvm_unreachable("unexpected load width");
		case 8:
		LdrOpc = ARM::tLDRBi;
		break;
		case 16:
		LdrOpc = ARM::tLDRHi;
		break;
		case 32:
		LdrOpc = ARM::tLDRi;
		break;
		case 64:
		LdrOpc = ARM::t2LDRDi8;
		break;
		}
		MCInstBuilder LdrBuilder(LdrOpc);
		LdrBuilder.addReg(Dst);
		if (Width == 64) LdrBuilder.addReg(DstHi);
		LdrBuilder.addReg(Ptr)
		.addImm(0) // Offset
		.addImm(LoadCond) // Predicate
		.addReg(ARM::CPSR); // CPSR in
		EmitToStreamer(*OutStreamer, LdrBuilder);

		// TODO: extens this IT block for 64-bit loads if not deprecated
		// IT LS
		EmitToStreamer(
		*OutStreamer,
		MCInstBuilder(ARM::t2IT).addImm(FailCond).addImm(8)); // Mask: T

		// MOVLS dst, failval
		EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::tMOVr)
		.addReg(Dst)
		.addReg(FailVal)
		.addImm(FailCond) // Predicate
		.addReg(ARM::CPSR)); // CPSR in

		if (Width == 64) {
		// IT LS
		EmitToStreamer(
		*OutStreamer,
		MCInstBuilder(ARM::t2IT).addImm(FailCond).addImm(8)); // Mask: T

		// MOVLS dsthi, failvalhi
		EmitToStreamer(*OutStreamer, MCInstBuilder(ARM::tMOVr)
		.addReg(DstHi)
		.addReg(FailValHi)
		.addImm(FailCond) // Predicate
		.addReg(ARM::CPSR)); // CPSR in
		}

		// CSDB
		uint32_t Csdb = 0xf3af8014UL;
		OutStreamer->AddComment("csdb");
		ATS.emitInst(Csdb, 'w');
		}

// Simple pseudo-instructions have their lowering (with expansion to real		// Simple pseudo-instructions have their lowering (with expansion to real
// instructions) auto-generated.		// instructions) auto-generated.
#include "ARMGenMCPseudoLowering.inc"		#include "ARMGenMCPseudoLowering.inc"

void ARMAsmPrinter::EmitInstruction(const MachineInstr *MI) {		void ARMAsmPrinter::EmitInstruction(const MachineInstr *MI) {
const DataLayout &DL = getDataLayout();		const DataLayout &DL = getDataLayout();
MCTargetStreamer &TS = *OutStreamer->getTargetStreamer();		MCTargetStreamer &TS = *OutStreamer->getTargetStreamer();
ARMTargetStreamer &ATS = static_cast<ARMTargetStreamer &>(TS);		ARMTargetStreamer &ATS = static_cast<ARMTargetStreamer &>(TS);
▲ Show 20 Lines • Show All 822 Lines • ▼ Show 20 Lines	case ARM::PATCHABLE_FUNCTION_ENTER:
LowerPATCHABLE_FUNCTION_ENTER(*MI);		LowerPATCHABLE_FUNCTION_ENTER(*MI);
return;		return;
case ARM::PATCHABLE_FUNCTION_EXIT:		case ARM::PATCHABLE_FUNCTION_EXIT:
LowerPATCHABLE_FUNCTION_EXIT(*MI);		LowerPATCHABLE_FUNCTION_EXIT(*MI);
return;		return;
case ARM::PATCHABLE_TAIL_CALL:		case ARM::PATCHABLE_TAIL_CALL:
LowerPATCHABLE_TAIL_CALL(*MI);		LowerPATCHABLE_TAIL_CALL(*MI);
return;		return;

		case ARM::NOSPECULATELOAD8_both:
		EmitNOSPECULATELOAD(MI, false, false, 8);
		return;
		case ARM::NOSPECULATELOAD16_both:
		EmitNOSPECULATELOAD(MI, false, false, 16);
		return;
		case ARM::NOSPECULATELOAD32_both:
		EmitNOSPECULATELOAD(MI, false, false, 32);
		return;
		case ARM::NOSPECULATELOAD64_both:
		EmitNOSPECULATELOAD(MI, false, false, 64);
		return;
		case ARM::NOSPECULATELOAD8_nolower:
		EmitNOSPECULATELOAD(MI, true, false, 8);
		return;
		case ARM::NOSPECULATELOAD16_nolower:
		EmitNOSPECULATELOAD(MI, true, false, 16);
		return;
		case ARM::NOSPECULATELOAD32_nolower:
		EmitNOSPECULATELOAD(MI, true, false, 32);
		return;
		case ARM::NOSPECULATELOAD64_nolower:
		EmitNOSPECULATELOAD(MI, true, false, 64);
		return;
		case ARM::NOSPECULATELOAD8_noupper:
		EmitNOSPECULATELOAD(MI, false, true, 8);
		return;
		case ARM::NOSPECULATELOAD16_noupper:
		EmitNOSPECULATELOAD(MI, false, true, 16);
		return;
		case ARM::NOSPECULATELOAD32_noupper:
		EmitNOSPECULATELOAD(MI, false, true, 32);
		return;
		case ARM::NOSPECULATELOAD64_noupper:
		EmitNOSPECULATELOAD(MI, false, true, 64);
		return;

		case ARM::tNOSPECULATELOAD8_both:
		EmitThumbNOSPECULATELOAD(MI, false, false, 8);
		return;
		case ARM::tNOSPECULATELOAD16_both:
		EmitThumbNOSPECULATELOAD(MI, false, false, 16);
		return;
		case ARM::tNOSPECULATELOAD32_both:
		EmitThumbNOSPECULATELOAD(MI, false, false, 32);
		return;
		case ARM::tNOSPECULATELOAD64_both:
		EmitThumbNOSPECULATELOAD(MI, false, false, 64);
		return;
		case ARM::tNOSPECULATELOAD8_nolower:
		EmitThumbNOSPECULATELOAD(MI, true, false, 8);
		return;
		case ARM::tNOSPECULATELOAD16_nolower:
		EmitThumbNOSPECULATELOAD(MI, true, false, 16);
		return;
		case ARM::tNOSPECULATELOAD32_nolower:
		EmitThumbNOSPECULATELOAD(MI, true, false, 32);
		return;
		case ARM::tNOSPECULATELOAD64_nolower:
		EmitThumbNOSPECULATELOAD(MI, true, false, 64);
		return;
		case ARM::tNOSPECULATELOAD8_noupper:
		EmitThumbNOSPECULATELOAD(MI, false, true, 8);
		return;
		case ARM::tNOSPECULATELOAD16_noupper:
		EmitThumbNOSPECULATELOAD(MI, false, true, 16);
		return;
		case ARM::tNOSPECULATELOAD32_noupper:
		EmitThumbNOSPECULATELOAD(MI, false, true, 32);
		return;
		case ARM::tNOSPECULATELOAD64_noupper:
		EmitThumbNOSPECULATELOAD(MI, false, true, 64);
		return;
}		}

MCInst TmpInst;		MCInst TmpInst;
LowerARMMachineInstrToMCInst(MI, TmpInst, *this);		LowerARMMachineInstrToMCInst(MI, TmpInst, *this);

EmitToStreamer(*OutStreamer, TmpInst);		EmitToStreamer(*OutStreamer, TmpInst);
}		}

Show All 11 Lines

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 998 Lines • ▼ Show 20 Lines	if (Subtarget->hasAnyDataBarrier() &&
if (!InsertFencesForAtomic) {		if (!InsertFencesForAtomic) {
setOperationAction(ISD::ATOMIC_LOAD, MVT::i32, Custom);		setOperationAction(ISD::ATOMIC_LOAD, MVT::i32, Custom);
setOperationAction(ISD::ATOMIC_STORE, MVT::i32, Custom);		setOperationAction(ISD::ATOMIC_STORE, MVT::i32, Custom);
}		}
}		}

setOperationAction(ISD::PREFETCH, MVT::Other, Custom);		setOperationAction(ISD::PREFETCH, MVT::Other, Custom);

		setOperationAction(ISD::NOSPECULATELOAD, MVT::i64, Custom);
		setOperationAction(ISD::NOSPECULATELOAD_NOLOWER, MVT::i64, Custom);
		setOperationAction(ISD::NOSPECULATELOAD_NOUPPER, MVT::i64, Custom);

// Requires SXTB/SXTH, available on v6 and up in both ARM and Thumb modes.		// Requires SXTB/SXTH, available on v6 and up in both ARM and Thumb modes.
if (!Subtarget->hasV6Ops()) {		if (!Subtarget->hasV6Ops()) {
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16, Expand);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16, Expand);
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8, Expand);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8, Expand);
}		}
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand);

if (!Subtarget->useSoftFloat() && Subtarget->hasVFP2() &&		if (!Subtarget->useSoftFloat() && Subtarget->hasVFP2() &&
▲ Show 20 Lines • Show All 6,948 Lines • ▼ Show 20 Lines	static void ReplaceLongIntrinsic(SDNode *N, SmallVectorImpl<SDValue> &Results,
SDValue LongMul = DAG.getNode(Opc, dl,		SDValue LongMul = DAG.getNode(Opc, dl,
DAG.getVTList(MVT::i32, MVT::i32),		DAG.getVTList(MVT::i32, MVT::i32),
N->getOperand(1), N->getOperand(2),		N->getOperand(1), N->getOperand(2),
Lo, Hi);		Lo, Hi);
Results.push_back(LongMul.getValue(0));		Results.push_back(LongMul.getValue(0));
Results.push_back(LongMul.getValue(1));		Results.push_back(LongMul.getValue(1));
}		}

		static void ReplaceNOSPECULATELOADResults(SDNode *N,
		SmallVectorImpl<SDValue> &Results,
		SelectionDAG &DAG,
		const ARMSubtarget *Subtarget) {
		assert(N->getValueType(0) == MVT::i64 &&
		"NOSPECULATELOAD on types less than 64 should be legal");
		SDLoc dl(N);
		SDValue FailValLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32,
		N->getOperand(4),
		DAG.getConstant(0, dl, MVT::i32));
		SDValue FailValHi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32,
		N->getOperand(4),
		DAG.getConstant(1, dl, MVT::i32));

		SDValue Ops[] = {
		N->getOperand(1), // Ptr
		N->getOperand(2), // LowerBound
		N->getOperand(3), // UpperBound
		FailValLo,
		FailValHi,
		N->getOperand(5), // CmpPtr
		N->getOperand(0), // Chain
		};
		SDVTList NodeTys = Subtarget->isThumb()
		? DAG.getVTList(MVT::i32, MVT::i32, MVT::Other)
		: DAG.getVTList(MVT::Untyped, MVT::Other);
		unsigned Opcode = Subtarget->isThumb() ? ARM::tNOSPECULATELOAD64_both
		: ARM::NOSPECULATELOAD64_both;
		MachineSDNode *NewNode = DAG.getMachineNode(
		Opcode, dl, NodeTys, Ops);

		MachineFunction &MF = DAG.getMachineFunction();
		MachineSDNode::mmo_iterator MemOp = MF.allocateMemRefsArray(1);
		MemOp[0] = cast<MemSDNode>(N)->getMemOperand();
		NewNode->setMemRefs(MemOp, MemOp + 1);

		if (Subtarget->isThumb()) {
		Results.push_back(DAG.getNode(ISD::BUILD_PAIR, dl,
		MVT::i64, SDValue(NewNode, 0),
		SDValue(NewNode, 1)));
		Results.push_back(SDValue(NewNode, 2)); // Chain
		} else {
		bool isBigEndian = DAG.getDataLayout().isBigEndian();
		Results.push_back(
		DAG.getTargetExtractSubreg(isBigEndian ? ARM::gsub_1 : ARM::gsub_0,
		SDLoc(N), MVT::i32, SDValue(NewNode, 0)));
		Results.push_back(
		DAG.getTargetExtractSubreg(isBigEndian ? ARM::gsub_0 : ARM::gsub_1,
		SDLoc(N), MVT::i32, SDValue(NewNode, 0)));
		Results.push_back(SDValue(NewNode, 1)); // Chain
		}
		}

		static void ReplaceNOSPECULATELOADOneCheckResults(
		SDNode *N, SmallVectorImpl<SDValue> &Results, SelectionDAG &DAG,
		const ARMSubtarget *Subtarget) {
		assert(N->getValueType(0) == MVT::i64 &&
		"NOSPECULATELOAD on types less than 64 should be legal");
		SDLoc dl(N);
		SDValue FailValLo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32,
		N->getOperand(3),
		DAG.getConstant(0, dl, MVT::i32));
		SDValue FailValHi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32,
		N->getOperand(3),
		DAG.getConstant(1, dl, MVT::i32));

		SDValue Ops[] = {
		N->getOperand(1), // Ptr
		N->getOperand(2), // Bound
		FailValLo,
		FailValHi,
		N->getOperand(4), // CmpPtr
		N->getOperand(0), // Chain
		};
		SDVTList NodeTys = Subtarget->isThumb()
		? DAG.getVTList(MVT::i32, MVT::i32, MVT::Other)
		: DAG.getVTList(MVT::Untyped, MVT::Other);
		unsigned Opcode;
		if (N->getOpcode() == ISD::NOSPECULATELOAD_NOLOWER)
		Opcode = Subtarget->isThumb() ? ARM::tNOSPECULATELOAD64_nolower
		: ARM::NOSPECULATELOAD64_nolower;
		else
		Opcode = Subtarget->isThumb() ? ARM::tNOSPECULATELOAD64_noupper
		: ARM::NOSPECULATELOAD64_noupper;
		MachineSDNode *NewNode = DAG.getMachineNode(Opcode, dl, NodeTys, Ops);

		MachineFunction &MF = DAG.getMachineFunction();
		MachineSDNode::mmo_iterator MemOp = MF.allocateMemRefsArray(1);
		MemOp[0] = cast<MemSDNode>(N)->getMemOperand();
		NewNode->setMemRefs(MemOp, MemOp + 1);

		if (Subtarget->isThumb()) {
		Results.push_back(DAG.getNode(ISD::BUILD_PAIR, dl,
		MVT::i64, SDValue(NewNode, 0),
		SDValue(NewNode, 1)));
		Results.push_back(SDValue(NewNode, 2)); // Chain
		} else {
		bool isBigEndian = DAG.getDataLayout().isBigEndian();
		Results.push_back(
		DAG.getTargetExtractSubreg(isBigEndian ? ARM::gsub_1 : ARM::gsub_0,
		SDLoc(N), MVT::i32, SDValue(NewNode, 0)));
		Results.push_back(
		DAG.getTargetExtractSubreg(isBigEndian ? ARM::gsub_0 : ARM::gsub_1,
		SDLoc(N), MVT::i32, SDValue(NewNode, 0)));
		Results.push_back(SDValue(NewNode, 1)); // Chain
		}
		}

/// ReplaceNodeResults - Replace the results of node with an illegal result		/// ReplaceNodeResults - Replace the results of node with an illegal result
/// type with new values built out of custom code.		/// type with new values built out of custom code.
void ARMTargetLowering::ReplaceNodeResults(SDNode *N,		void ARMTargetLowering::ReplaceNodeResults(SDNode *N,
SmallVectorImpl<SDValue> &Results,		SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDValue Res;		SDValue Res;
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
Show All 27 Lines	case ISD::SDIV:
assert(Subtarget->isTargetWindows() && "can only expand DIV on Windows");		assert(Subtarget->isTargetWindows() && "can only expand DIV on Windows");
return ExpandDIV_Windows(SDValue(N, 0), DAG, N->getOpcode() == ISD::SDIV,		return ExpandDIV_Windows(SDValue(N, 0), DAG, N->getOpcode() == ISD::SDIV,
Results);		Results);
case ISD::ATOMIC_CMP_SWAP:		case ISD::ATOMIC_CMP_SWAP:
ReplaceCMP_SWAP_64Results(N, Results, DAG);		ReplaceCMP_SWAP_64Results(N, Results, DAG);
return;		return;
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
return ReplaceLongIntrinsic(N, Results, DAG);		return ReplaceLongIntrinsic(N, Results, DAG);
		case ISD::NOSPECULATELOAD:
		ReplaceNOSPECULATELOADResults(N, Results, DAG, Subtarget);
		return;
		case ISD::NOSPECULATELOAD_NOLOWER:
		case ISD::NOSPECULATELOAD_NOUPPER:
		ReplaceNOSPECULATELOADOneCheckResults(N, Results, DAG, Subtarget);
		return;
}		}
if (Res.getNode())		if (Res.getNode())
Results.push_back(Res);		Results.push_back(Res);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ARM Scheduler Hooks		// ARM Scheduler Hooks
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 6,444 Lines • Show Last 20 Lines

lib/Target/ARM/ARMInstrInfo.td

Show First 20 Lines • Show All 5,546 Lines • ▼ Show 20 Lines	def TPsoft : ARMPseudoInst<(outs), (ins), 4, IIC_Br,
Requires<[IsARM, IsReadTPSoft]>;		Requires<[IsARM, IsReadTPSoft]>;
}		}

// Reading thread pointer from coprocessor register		// Reading thread pointer from coprocessor register
def : ARMPat<(ARMthread_pointer), (MRC 15, 0, 13, 0, 3)>,		def : ARMPat<(ARMthread_pointer), (MRC 15, 0, 13, 0, 3)>,
Requires<[IsARM, IsReadTPHard]>;		Requires<[IsARM, IsReadTPHard]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// Speculation barrier intrinsics
		//
		multiclass NoSpeculateLoad<code TypeCheck> {
		def _both_frag : PatFrag<(ops node:$ptr, node:$lower_bound,
		node:$upper_bound, node:$failval,
		node:$cmpptr),
		(nospeculateload node:$ptr, node:$lower_bound,
		node:$upper_bound, node:$failval,
		node:$cmpptr), TypeCheck>;
		def _nolower_frag : PatFrag<(ops node:$ptr,
		node:$upper_bound, node:$failval,
		node:$cmpptr),
		(nospeculateload_nolower node:$ptr,
		node:$upper_bound, node:$failval,
		node:$cmpptr), TypeCheck>;
		def _noupper_frag : PatFrag<(ops node:$ptr, node:$lower_bound,
		node:$failval, node:$cmpptr),
		(nospeculateload_noupper node:$ptr, node:$lower_bound,
		node:$failval,
		node:$cmpptr), TypeCheck>;

		let Defs = [CPSR], hasSideEffects = 1, isCodeGenOnly = 1, mayLoad = 1,
		Constraints = "@earlyclobber $dst" in {
		def _both : ARMPseudoInst<(outs GPRnopc:$dst),
		(ins GPRnopc:$ptr, GPRnopc:$lower_bound,
		GPRnopc:$upper_bound, GPRnopc:$failval,
		GPRnopc:$cmpptr),
		20, IIC_iCMPr,
		[(set GPRnopc:$dst,
		(!cast<SDNode>(NAME # "_both_frag") GPRnopc:$ptr,
		GPRnopc:$lower_bound, GPRnopc:$upper_bound,
		GPRnopc:$failval, GPRnopc:$cmpptr))]>,
		Sched<[]>;
		def _nolower :
		ARMPseudoInst<(outs GPRnopc:$dst),
		(ins GPRnopc:$ptr, GPRnopc:$upper_bound,
		GPRnopc:$failval, GPRnopc:$cmpptr),
		16, IIC_iCMPr,
		[(set GPRnopc:$dst,
		(!cast<SDNode>(NAME # "_nolower_frag")
		GPRnopc:$ptr, GPRnopc:$upper_bound,
		GPRnopc:$failval, GPRnopc:$cmpptr))]>,
		Sched<[]>;
		def _noupper :
		ARMPseudoInst<(outs GPRnopc:$dst),
		(ins GPRnopc:$ptr, GPRnopc:$lower_bound,
		GPRnopc:$failval, GPRnopc:$cmpptr),
		16, IIC_iCMPr,
		[(set GPRnopc:$dst,
		(!cast<SDNode>(NAME # "_noupper_frag")
		GPRnopc:$ptr, GPRnopc:$lower_bound,
		GPRnopc:$failval, GPRnopc:$cmpptr))]>,
		Sched<[]>;
		}
		}

		defm NOSPECULATELOAD8 : NoSpeculateLoad<
		[{return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i8;}]>;
		defm NOSPECULATELOAD16 : NoSpeculateLoad<
		[{return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i16;}]>;
		defm NOSPECULATELOAD32 : NoSpeculateLoad<
		[{return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i32;}]>;

		let Defs = [CPSR], hasSideEffects = 1, isCodeGenOnly = 1, mayLoad = 1,
		Constraints = "@earlyclobber $dst" in {
		def NOSPECULATELOAD64_both : ARMPseudoInst<(outs GPRPair:$dst),
		(ins GPRnopc:$ptr, GPRnopc:$lower_bound,
		GPRnopc:$upper_bound, GPRnopc:$failvallo,
		GPRnopc:$failvalhi, GPRnopc:$cmpptr),
		24, IIC_iCMPr,
		[]>,
		Sched<[]>;
		def NOSPECULATELOAD64_nolower : ARMPseudoInst<(outs GPRPair:$dst),
		(ins GPRnopc:$ptr, GPRnopc:$upper_bound,
		GPRnopc:$failvallo, GPRnopc:$failvalhi,
		GPRnopc:$cmpptr),
		20, IIC_iCMPr,
		[]>,
		Sched<[]>;
		def NOSPECULATELOAD64_noupper : ARMPseudoInst<(outs GPRPair:$dst),
		(ins GPRnopc:$ptr, GPRnopc:$lower_bound,
		GPRnopc:$failvallo, GPRnopc:$failvalhi,
		GPRnopc:$cmpptr),
		20, IIC_iCMPr,
		[]>,
		Sched<[]>;
		}

		//===----------------------------------------------------------------------===//
// SJLJ Exception handling intrinsics		// SJLJ Exception handling intrinsics
// eh_sjlj_setjmp() is an instruction sequence to store the return		// eh_sjlj_setjmp() is an instruction sequence to store the return
// address and save #0 in R0 for the non-longjmp case.		// address and save #0 in R0 for the non-longjmp case.
// Since by its nature we may be coming from some other function to get		// Since by its nature we may be coming from some other function to get
// here, and we're using the stack frame for the containing function to		// here, and we're using the stack frame for the containing function to
// save/restore registers, we can't keep anything live in regs across		// save/restore registers, we can't keep anything live in regs across
// the eh_sjlj_setjmp(), else it will almost certainly have been tromped upon		// the eh_sjlj_setjmp(), else it will almost certainly have been tromped upon
// when we get here from a longjmp(). We force everything out of registers		// when we get here from a longjmp(). We force everything out of registers
▲ Show 20 Lines • Show All 555 Lines • Show Last 20 Lines

lib/Target/ARM/ARMInstrThumb2.td

	Show First 20 Lines • Show All 3,876 Lines • ▼ Show 20 Lines

	// Pseudo isntruction that combines movs + predicated rsbmi			// Pseudo isntruction that combines movs + predicated rsbmi
	// to implement integer ABS			// to implement integer ABS
	let usesCustomInserter = 1, Defs = [CPSR] in {			let usesCustomInserter = 1, Defs = [CPSR] in {
	def t2ABS : PseudoInst<(outs rGPR:$dst), (ins rGPR:$src),			def t2ABS : PseudoInst<(outs rGPR:$dst), (ins rGPR:$src),
	NoItinerary, []>, Requires<[IsThumb2]>;			NoItinerary, []>, Requires<[IsThumb2]>;
	}			}


				//===----------------------------------------------------------------------===//
				// Speculation barrier intrinsics
				//
				multiclass tNoSpeculateLoad<code TypeCheck> {
				def _both_frag : PatFrag<(ops node:$ptr, node:$lower_bound,
				node:$upper_bound, node:$failval,
				node:$cmpptr),
				(nospeculateload node:$ptr, node:$lower_bound,
				node:$upper_bound, node:$failval,
				node:$cmpptr), TypeCheck>;
				def _nolower_frag : PatFrag<(ops node:$ptr,
				node:$upper_bound, node:$failval,
				node:$cmpptr),
				(nospeculateload_nolower node:$ptr,
				node:$upper_bound, node:$failval,
				node:$cmpptr), TypeCheck>;
				def _noupper_frag : PatFrag<(ops node:$ptr, node:$lower_bound,
				node:$failval, node:$cmpptr),
				(nospeculateload_noupper node:$ptr, node:$lower_bound,
				node:$failval,
				node:$cmpptr), TypeCheck>;

				let Defs = [CPSR], hasSideEffects = 1, isCodeGenOnly = 1, mayLoad = 1,
				Constraints = "@earlyclobber $dst" in {
				def _both : tPseudoInst<(outs tGPR:$dst),
				(ins tGPR:$ptr, tGPR:$lower_bound,
				tGPR:$upper_bound, rGPR:$failval,
				tGPR:$cmpptr),
				18, IIC_iCMPr,
				[(set tGPR:$dst,
				(!cast<SDNode>(NAME # "_both_frag") tGPR:$ptr, tGPR:$lower_bound,
				tGPR:$upper_bound, rGPR:$failval,
				tGPR:$cmpptr))]>,
				Sched<[]>;
				def _nolower :
				tPseudoInst<(outs tGPR:$dst),
				(ins tGPR:$ptr, tGPR:$upper_bound,
				rGPR:$failval, tGPR:$cmpptr),
				14, IIC_iCMPr,
				[(set tGPR:$dst,
				(!cast<SDNode>(NAME # "_nolower_frag") tGPR:$ptr,
				tGPR:$upper_bound, rGPR:$failval,
				tGPR:$cmpptr))]>,
				Sched<[]>;
				def _noupper :
				tPseudoInst<(outs tGPR:$dst),
				(ins tGPR:$ptr, tGPR:$lower_bound,
				rGPR:$failval, tGPR:$cmpptr),
				14, IIC_iCMPr,
				[(set tGPR:$dst,
				(!cast<SDNode>(NAME # "_noupper_frag") tGPR:$ptr, tGPR:$lower_bound,
				rGPR:$failval,
				tGPR:$cmpptr))]>,
				Sched<[]>;
				}
				}

				defm tNOSPECULATELOAD8 : tNoSpeculateLoad<
				[{return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i8;}]>;
				defm tNOSPECULATELOAD16 : tNoSpeculateLoad<
				[{return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i16;}]>;
				defm tNOSPECULATELOAD32 : tNoSpeculateLoad<
				[{return cast<MemIntrinsicSDNode>(N)->getMemoryVT() == MVT::i32;}]>;

				let Defs = [CPSR], hasSideEffects = 1, isCodeGenOnly = 1, mayLoad = 1,
				Constraints = "@earlyclobber $dstlo,@earlyclobber $dsthi" in {
				def tNOSPECULATELOAD64_both : ARMPseudoInst<(outs rGPR:$dstlo, rGPR:$dsthi),
				(ins tGPR:$ptr, tGPR:$lower_bound,
				tGPR:$upper_bound, rGPR:$failvallo, rGPR:$failvalhi,
				tGPR:$cmpptr),
				24, IIC_iCMPr,
				[]>,
				Sched<[]>;
				def tNOSPECULATELOAD64_nolower : ARMPseudoInst<(outs rGPR:$dstlo, rGPR:$dsthi),
				(ins tGPR:$ptr,
				tGPR:$upper_bound, rGPR:$failvallo, rGPR:$failvalhi,
				tGPR:$cmpptr),
				20, IIC_iCMPr,
				[]>,
				Sched<[]>;
				def tNOSPECULATELOAD64_noupper : ARMPseudoInst<(outs rGPR:$dstlo, rGPR:$dsthi),
				(ins tGPR:$ptr,
				tGPR:$lower_bound, rGPR:$failvallo, rGPR:$failvalhi,
				tGPR:$cmpptr),
				20, IIC_iCMPr,
				[]>,
				Sched<[]>;
				}


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Coprocessor load/store -- for disassembly only			// Coprocessor load/store -- for disassembly only
	//			//
	class T2CI<bits<4> op31_28, dag oops, dag iops, string opc, string asm, list<dag> pattern>			class T2CI<bits<4> op31_28, dag oops, dag iops, string opc, string asm, list<dag> pattern>
	: T2I<oops, iops, NoItinerary, opc, asm, pattern> {			: T2I<oops, iops, NoItinerary, opc, asm, pattern> {
	let Inst{31-28} = op31_28;			let Inst{31-28} = op31_28;
	let Inst{27-25} = 0b110;			let Inst{27-25} = 0b110;
	}			}
	▲ Show 20 Lines • Show All 933 Lines • Show Last 20 Lines

test/CodeGen/AArch64/no-speculate.ll

This file was added.

				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-eabi \| FileCheck %s --check-prefixes=CHECK
				; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64_be-eabi \| FileCheck %s --check-prefixes=CHECK
				; RUN: llc -fast-isel -verify-machineinstrs < %s -mtriple=aarch64-eabi \| FileCheck %s --check-prefixes=CHECK
				; RUN: llc -global-isel -global-isel-abort=0 -verify-machineinstrs < %s -mtriple=aarch64-eabi \| FileCheck %s --check-prefixes=CHECK

				declare i8 @llvm.nospeculateload.i8(i8, i8, i8, i8, i8)
				declare i8 @llvm.nospeculateload.nolower.i8(i8, i8, i8, i8*)
				declare i8 @llvm.nospeculateload.noupper.i8(i8, i8, i8, i8*)
				declare i16 @llvm.nospeculateload.i16(i16, i8, i8, i16, i8)
				declare i16 @llvm.nospeculateload.nolower.i16(i16, i8, i16, i8*)
				declare i16 @llvm.nospeculateload.noupper.i16(i16, i8, i16, i8*)
				declare i32 @llvm.nospeculateload.i32(i32, i8, i8, i32, i8)
				declare i32 @llvm.nospeculateload.nolower.i32(i32, i8, i32, i8*)
				declare i32 @llvm.nospeculateload.noupper.i32(i32, i8, i32, i8*)
				declare i64 @llvm.nospeculateload.i64(i64, i8, i8, i64, i8)
				declare i64 @llvm.nospeculateload.nolower.i64(i64, i8, i64, i8*)
				declare i64 @llvm.nospeculateload.noupper.i64(i64, i8, i64, i8*)
				declare i128 @llvm.nospeculateload.i128(i128, i8, i8, i128, i8)
				declare i128 @llvm.nospeculateload.nolower.i128(i128, i8, i128, i8*)
				declare i128 @llvm.nospeculateload.noupper.i128(i128, i8, i128, i8*)

				define i8 @f_i8(i8* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i8 %failval) {
				entry:
				%0 = tail call i8 @llvm.nospeculateload.i8(i8* %ptr, i8* %lowerbound, i8* %upperbound, i8 %failval, i8* %cmpptr)
				ret i8 %0
				; CHECK-LABEL: f_i8:
				; CHECK: cmp x3, x1
				; CHECK-NEXT: ccmp x3, x2, #2, hs
				; CHECK-NEXT: b.hs [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldrb [[DST:w[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST]], [[DST]], w4, lo
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov w0, [[DST]]
				; CHECK-NEXT: ret
				}

				define i8 @f_i8_nolower(i8* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i8 %failval) {
				entry:
				%0 = tail call i8 @llvm.nospeculateload.nolower.i8(i8* %ptr, i8* %upperbound, i8 %failval, i8* %cmpptr)
				ret i8 %0
				; CHECK-LABEL: f_i8_nolower:
				; CHECK: cmp x3, x2
				; CHECK-NEXT: b.hs [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldrb [[DST:w[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST]], [[DST]], w4, lo
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov w0, [[DST]]
				; CHECK-NEXT: ret
				}

				define i8 @f_i8_noupper(i8* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i8 %failval) {
				entry:
				%0 = tail call i8 @llvm.nospeculateload.noupper.i8(i8* %ptr, i8* %lowerbound, i8 %failval, i8* %cmpptr)
				ret i8 %0
				; CHECK-LABEL: f_i8_noupper:
				; CHECK: cmp x3, x1
				; CHECK-NEXT: b.lo [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldrb [[DST:w[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST]], [[DST]], w4, hs
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov w0, [[DST]]
				; CHECK-NEXT: ret
				}

				define i16 @f_i16(i16* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i16 %failval) {
				entry:
				%0 = tail call i16 @llvm.nospeculateload.i16(i16* %ptr, i8* %lowerbound, i8* %upperbound, i16 %failval, i8* %cmpptr)
				ret i16 %0
				; CHECK-LABEL: f_i16:
				; CHECK: cmp x3, x1
				; CHECK-NEXT: ccmp x3, x2, #2, hs
				; CHECK-NEXT: b.hs [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldrh [[DST:w[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST]], [[DST]], w4, lo
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov w0, [[DST]]
				; CHECK-NEXT: ret
				}

				define i16 @f_i16_nolower(i16* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i16 %failval) {
				entry:
				%0 = tail call i16 @llvm.nospeculateload.nolower.i16(i16* %ptr, i8* %upperbound, i16 %failval, i8* %cmpptr)
				ret i16 %0
				; CHECK-LABEL: f_i16_nolower:
				; CHECK: cmp x3, x2
				; CHECK-NEXT: b.hs [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldrh [[DST:w[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST]], [[DST]], w4, lo
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov w0, [[DST]]
				; CHECK-NEXT: ret
				}

				define i16 @f_i16_noupper(i16* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i16 %failval) {
				entry:
				%0 = tail call i16 @llvm.nospeculateload.noupper.i16(i16* %ptr, i8* %lowerbound, i16 %failval, i8* %cmpptr)
				ret i16 %0
				; CHECK-LABEL: f_i16_noupper:
				; CHECK: cmp x3, x1
				; CHECK-NEXT: b.lo [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldrh [[DST:w[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST]], [[DST]], w4, hs
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov w0, [[DST]]
				; CHECK-NEXT: ret
				}

				define i32 @f_i32(i32* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i32 %failval) {
				entry:
				%0 = tail call i32 @llvm.nospeculateload.i32(i32* %ptr, i8* %lowerbound, i8* %upperbound, i32 %failval, i8* %cmpptr)
				ret i32 %0
				; CHECK-LABEL: f_i32:
				; CHECK: cmp x3, x1
				; CHECK-NEXT: ccmp x3, x2, #2, hs
				; CHECK-NEXT: b.hs [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldr [[DST:w[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST]], [[DST]], w4, lo
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov w0, [[DST]]
				; CHECK-NEXT: ret
				}

				define i32 @f_i32_nolower(i32* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i32 %failval) {
				entry:
				%0 = tail call i32 @llvm.nospeculateload.nolower.i32(i32* %ptr, i8* %upperbound, i32 %failval, i8* %cmpptr)
				ret i32 %0
				; CHECK-LABEL: f_i32_nolower:
				; CHECK: cmp x3, x2
				; CHECK-NEXT: b.hs [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldr [[DST:w[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST]], [[DST]], w4, lo
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov w0, [[DST]]
				; CHECK-NEXT: ret
				}

				define i32 @f_i32_noupper(i32* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i32 %failval) {
				entry:
				%0 = tail call i32 @llvm.nospeculateload.noupper.i32(i32* %ptr, i8* %lowerbound, i32 %failval, i8* %cmpptr)
				ret i32 %0
				; CHECK-LABEL: f_i32_noupper:
				; CHECK: cmp x3, x1
				; CHECK-NEXT: b.lo [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldr [[DST:w[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST]], [[DST]], w4, hs
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov w0, [[DST]]
				; CHECK-NEXT: ret
				}

				define i64 @f_i64(i64* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i64 %failval) {
				entry:
				%0 = tail call i64 @llvm.nospeculateload.i64(i64* %ptr, i8* %lowerbound, i8* %upperbound, i64 %failval, i8* %cmpptr)
				ret i64 %0
				; CHECK-LABEL: f_i64:
				; CHECK: cmp x3, x1
				; CHECK-NEXT: ccmp x3, x2, #2, hs
				; CHECK-NEXT: b.hs [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldr [[DST:x[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST]], [[DST]], x4, lo
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov x0, [[DST]]
				; CHECK-NEXT: ret
				}

				define i64 @f_i64_nolower(i64* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i64 %failval) {
				entry:
				%0 = tail call i64 @llvm.nospeculateload.nolower.i64(i64* %ptr, i8* %upperbound, i64 %failval, i8* %cmpptr)
				ret i64 %0
				; CHECK-LABEL: f_i64_nolower:
				; CHECK: cmp x3, x2
				; CHECK-NEXT: b.hs [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldr [[DST:x[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST]], [[DST]], x4, lo
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov x0, [[DST]]
				; CHECK-NEXT: ret
				}

				define i64 @f_i64_noupper(i64* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i64 %failval) {
				entry:
				%0 = tail call i64 @llvm.nospeculateload.noupper.i64(i64* %ptr, i8* %lowerbound, i64 %failval, i8* %cmpptr)
				ret i64 %0
				; CHECK-LABEL: f_i64_noupper:
				; CHECK: cmp x3, x1
				; CHECK-NEXT: b.lo [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldr [[DST:x[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST]], [[DST]], x4, hs
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov x0, [[DST]]
				; CHECK-NEXT: ret
				}

				define i128 @f_i128(i128* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i128 %failval) {
				entry:
				%0 = tail call i128 @llvm.nospeculateload.i128(i128* %ptr, i8* %lowerbound, i8* %upperbound, i128 %failval, i8* %cmpptr)
				ret i128 %0
				; CHECK-LABEL: f_i128:
				; CHECK: cmp x3, x1
				; CHECK-NEXT: ccmp x3, x2, #2, hs
				; CHECK-NEXT: b.hs [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldp [[DST1:x[0-9]+]], [[DST2:x[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST1]], [[DST1]], x4, lo
				; CHECK-NEXT: csel [[DST2]], [[DST2]], x5, lo
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov x0, [[DST1]]
				; CHECK-NEXT: mov x1, [[DST2]]
				; CHECK-NEXT: ret
				}

				define i128 @f_i128_nolower(i128* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i128 %failval) {
				entry:
				%0 = tail call i128 @llvm.nospeculateload.nolower.i128(i128* %ptr, i8* %upperbound, i128 %failval, i8* %cmpptr)
				ret i128 %0
				; CHECK-LABEL: f_i128_nolower:
				; CHECK: cmp x3, x2
				; CHECK-NEXT: b.hs [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldp [[DST1:x[0-9]+]], x1, [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST1]], [[DST1]], x4, lo
				; CHECK-NEXT: csel x1, x1, x5, lo
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov x0, [[DST1]]
				; CHECK-NEXT: ret
				}

				define i128 @f_i128_noupper(i128* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i128 %failval) {
				entry:
				%0 = tail call i128 @llvm.nospeculateload.noupper.i128(i128* %ptr, i8* %lowerbound, i128 %failval, i8* %cmpptr)
				ret i128 %0
				; CHECK-LABEL: f_i128_noupper:
				; CHECK: cmp x3, x1
				; CHECK-NEXT: b.lo [[FAILLABEL:.[0-9a-zA-Z]+]]
				; CHECK-NEXT: ldp [[DST1:x[0-9]+]], [[DST2:x[0-9]+]], [x0]
				; CHECK-NEXT: [[FAILLABEL]]:
				; CHECK-NEXT: csel [[DST1]], [[DST1]], x4, hs
				; CHECK-NEXT: csel [[DST2]], [[DST2]], x5, hs
				; CHECK-NEXT: hint #20
				; CHECK-NEXT: mov x0, [[DST1]]
				; CHECK-NEXT: mov x1, [[DST2]]
				; CHECK-NEXT: ret
				}

test/CodeGen/ARM/no-speculate.ll

This file was added.

				; RUN: llc -verify-machineinstrs < %s -mtriple=armv7a-eabi \| FileCheck %s --check-prefixes=CHECK,CHECK-ARM,CHECK-LE
				; RUN: llc -verify-machineinstrs < %s -mtriple=thumbv7a-eabi \| FileCheck %s --check-prefixes=CHECK,CHECK-THUMB,CHECK-LE
				; RUN: llc -verify-machineinstrs < %s -mtriple=thumbv8a-eabi \| FileCheck %s --check-prefixes=CHECK,CHECK-THUMB,CHECK-LE
				; RUN: llc -verify-machineinstrs < %s -mtriple=armv7a_be-eabi \| FileCheck %s --check-prefixes=CHECK,CHECK-ARM,CHECK-BE
				; RUN: llc -verify-machineinstrs < %s -mtriple=thumbv7a_be-eabi \| FileCheck %s --check-prefixes=CHECK,CHECK-THUMB,CHECK-BE
				; RUN: llc -verify-machineinstrs < %s -mtriple=thumbv8a_be-eabi \| FileCheck %s --check-prefixes=CHECK,CHECK-THUMB,CHECK-BE
				; RUN: llc -fast-isel -verify-machineinstrs < %s -mtriple=armv7a-eabi \| FileCheck %s --check-prefixes=CHECK,CHECK-ARM
				; RUN: llc -global-isel -global-isel-abort=0 -verify-machineinstrs < %s -mtriple=armv7a-eabi \| FileCheck %s --check-prefixes=CHECK,CHECK-ARM


				declare i8 @llvm.nospeculateload.i8(i8, i8, i8, i8, i8)
				declare i8 @llvm.nospeculateload.nolower.i8(i8, i8, i8, i8*)
				declare i8 @llvm.nospeculateload.noupper.i8(i8, i8, i8, i8*)
				declare i16 @llvm.nospeculateload.i16(i16, i8, i8, i16, i8)
				declare i16 @llvm.nospeculateload.nolower.i16(i16, i8, i16, i8*)
				declare i16 @llvm.nospeculateload.noupper.i16(i16, i8, i16, i8*)
				declare i32 @llvm.nospeculateload.i32(i32, i8, i8, i32, i8)
				declare i32 @llvm.nospeculateload.nolower.i32(i32, i8, i32, i8*)
				declare i32 @llvm.nospeculateload.noupper.i32(i32, i8, i32, i8*)
				declare i64 @llvm.nospeculateload.i64(i64, i8, i8, i64, i8)
				declare i64 @llvm.nospeculateload.nolower.i64(i64, i8, i64, i8*)
				declare i64 @llvm.nospeculateload.noupper.i64(i64, i8, i64, i8*)

				define i8 @f_i8(i8* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i8 %failval) {
				entry:
				%0 = tail call i8 @llvm.nospeculateload.i8(i8* %ptr, i8* %lowerbound, i8* %upperbound, i8 %failval, i8* %cmpptr)
				ret i8 %0
				; CHECK-LABEL: f_i8:
				; %failval is passed on the stack
				; CHECK: ldr{{()\|b}}{{()\|.w}} [[FAILVAL:r[0-9]+\|lr]], [sp{{()\|, #[0-9]+}}]
				; CHECK-NEXT: cmp r3, r1
				; CHECK-THUMB-NEXT: it hs
				; CHECK-NEXT: cmphs r2, r3
				; CHECK-THUMB-NEXT: it hi
				; CHECK-NEXT: ldrbhi [[DST:r[0-9]+]], [r0]
				; CHECK-THUMB-NEXT: it ls
				; CHECK-NEXT: movls [[DST]], [[FAILVAL]]
				; check for csdb encoding:
				; CHECK-ARM-NEXT: .inst 0xe320f014
				; CHECK-THUMB-NEXT: .inst.w 0xf3af8014
				}

				define i8 @f_i8_nolower(i8* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i8 %failval) {
				entry:
				%0 = tail call i8 @llvm.nospeculateload.nolower.i8(i8* %ptr, i8* %upperbound, i8 %failval, i8* %cmpptr)
				ret i8 %0
				; CHECK-LABEL: f_i8_nolower:
				; %failval is passed on the stack
				; CHECK: ldr{{()\|b}}{{()\|.w}} [[FAILVAL:r[0-9]+\|lr]], [sp{{()\|, #[0-9]+}}]
				; CHECK-NEXT: cmp r3, r2
				; CHECK-THUMB-NEXT: it lo
				; CHECK-NEXT: ldrblo [[DST:r[0-9]+]], [r0]
				; CHECK-THUMB-NEXT: it hs
				; CHECK-NEXT: movhs [[DST]], [[FAILVAL]]
				; check for csdb encoding:
				; CHECK-ARM-NEXT: .inst 0xe320f014
				; CHECK-THUMB-NEXT: .inst.w 0xf3af8014
				}

				define i8 @f_i8_noupper(i8* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i8 %failval) {
				entry:
				%0 = tail call i8 @llvm.nospeculateload.noupper.i8(i8* %ptr, i8* %lowerbound, i8 %failval, i8* %cmpptr)
				ret i8 %0
				; CHECK-LABEL: f_i8_noupper:
				; %failval is passed on the stack
				; CHECK: ldr{{()\|b}}{{()\|.w}} [[FAILVAL:r[0-9]+\|lr]], [sp{{()\|, #[0-9]+}}]
				; CHECK-NEXT: cmp r3, r1
				; CHECK-THUMB-NEXT: it hs
				; CHECK-NEXT: ldrbhs [[DST:r[0-9]+]], [r0]
				; CHECK-THUMB-NEXT: it lo
				; CHECK-NEXT: movlo [[DST]], [[FAILVAL]]
				; check for csdb encoding:
				; CHECK-ARM-NEXT: .inst 0xe320f014
				; CHECK-THUMB-NEXT: .inst.w 0xf3af8014
				}

				define i16 @f_i16(i16* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i16 %failval) {
				entry:
				%0 = tail call i16 @llvm.nospeculateload.i16(i16* %ptr, i8* %lowerbound, i8* %upperbound, i16 %failval, i8* %cmpptr)
				ret i16 %0
				; CHECK-LABEL: f_i16:
				; %failval is passed on the stack
				; CHECK: ldr{{()\|h}}{{()\|.w}} [[FAILVAL:r[0-9]+\|lr]], [sp{{()\|, #[0-9]+}}]
				; CHECK-NEXT: cmp r3, r1
				; CHECK-THUMB-NEXT: it hs
				; CHECK-NEXT: cmphs r2, r3
				; CHECK-THUMB-NEXT: it hi
				; CHECK-NEXT: ldrhhi [[DST:r[0-9]+]], [r0]
				; CHECK-THUMB-NEXT: it ls
				; CHECK-NEXT: movls [[DST]], [[FAILVAL]]
				; check for csdb encoding:
				; CHECK-ARM-NEXT: .inst 0xe320f014
				; CHECK-THUMB-NEXT: .inst.w 0xf3af8014
				}

				define i16 @f_i16_nolower(i16* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i16 %failval) {
				entry:
				%0 = tail call i16 @llvm.nospeculateload.nolower.i16(i16* %ptr, i8* %upperbound, i16 %failval, i8* %cmpptr)
				ret i16 %0
				; CHECK-LABEL: f_i16_nolower:
				; %failval is passed on the stack
				; CHECK: ldr{{()\|h}}{{()\|.w}} [[FAILVAL:r[0-9]+\|lr]], [sp{{()\|, #[0-9]+}}]
				; CHECK-NEXT: cmp r3, r2
				; CHECK-THUMB-NEXT: it lo
				; CHECK-NEXT: ldrhlo [[DST:r[0-9]+]], [r0]
				; CHECK-THUMB-NEXT: it hs
				; CHECK-NEXT: movhs [[DST]], [[FAILVAL]]
				; check for csdb encoding:
				; CHECK-ARM-NEXT: .inst 0xe320f014
				; CHECK-THUMB-NEXT: .inst.w 0xf3af8014
				}

				define i16 @f_i16_noupper(i16* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i16 %failval) {
				entry:
				%0 = tail call i16 @llvm.nospeculateload.noupper.i16(i16* %ptr, i8* %lowerbound, i16 %failval, i8* %cmpptr)
				ret i16 %0
				; CHECK-LABEL: f_i16_noupper:
				; %failval is passed on the stack
				; CHECK: ldr{{()\|h}}{{()\|.w}} [[FAILVAL:r[0-9]+\|lr]], [sp{{()\|, #[0-9]+}}]
				; CHECK-NEXT: cmp r3, r1
				; CHECK-THUMB-NEXT: it hs
				; CHECK-NEXT: ldrhhs [[DST:r[0-9]+]], [r0]
				; CHECK-THUMB-NEXT: it lo
				; CHECK-NEXT: movlo [[DST]], [[FAILVAL]]
				; check for csdb encoding:
				; CHECK-ARM-NEXT: .inst 0xe320f014
				; CHECK-THUMB-NEXT: .inst.w 0xf3af8014
				}

				define i32 @f_i32(i32* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i32 %failval) {
				entry:
				%0 = tail call i32 @llvm.nospeculateload.i32(i32* %ptr, i8* %lowerbound, i8* %upperbound, i32 %failval, i8* %cmpptr)
				ret i32 %0
				; CHECK-LABEL: f_i32:
				; %failval is passed on the stack
				; CHECK: ldr{{()\|.w}} [[FAILVAL:r[0-9]+\|lr]], [sp{{()\|, #[0-9]+}}]
				; CHECK-NEXT: cmp r3, r1
				; CHECK-THUMB-NEXT: it hs
				; CHECK-NEXT: cmphs r2, r3
				; CHECK-THUMB-NEXT: it hi
				; CHECK-NEXT: ldrhi [[DST:r[0-9]+]], [r0]
				; CHECK-THUMB-NEXT: it ls
				; CHECK-NEXT: movls [[DST]], [[FAILVAL]]
				; check for csdb encoding:
				; CHECK-ARM-NEXT: .inst 0xe320f014
				; CHECK-THUMB-NEXT: .inst.w 0xf3af8014
				}

				define i32 @f_i32_nolower(i32* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i32 %failval) {
				entry:
				%0 = tail call i32 @llvm.nospeculateload.nolower.i32(i32* %ptr, i8* %upperbound, i32 %failval, i8* %cmpptr)
				ret i32 %0
				; CHECK-LABEL: f_i32_nolower:
				; %failval is passed on the stack
				; CHECK: ldr{{()\|.w}} [[FAILVAL:r[0-9]+\|lr]], [sp{{()\|, #[0-9]+}}]
				; CHECK-NEXT: cmp r3, r2
				; CHECK-THUMB-NEXT: it lo
				; CHECK-NEXT: ldrlo [[DST:r[0-9]+]], [r0]
				; CHECK-THUMB-NEXT: it hs
				; CHECK-NEXT: movhs [[DST]], [[FAILVAL]]
				; check for csdb encoding:
				; CHECK-ARM-NEXT: .inst 0xe320f014
				; CHECK-THUMB-NEXT: .inst.w 0xf3af8014
				}

				define i32 @f_i32_noupper(i32* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i32 %failval) {
				entry:
				%0 = tail call i32 @llvm.nospeculateload.noupper.i32(i32* %ptr, i8* %lowerbound, i32 %failval, i8* %cmpptr)
				ret i32 %0
				; CHECK-LABEL: f_i32_noupper:
				; %failval is passed on the stack
				; CHECK: ldr{{()\|.w}} [[FAILVAL:r[0-9]+\|lr]], [sp{{()\|, #[0-9]+}}]
				; CHECK-NEXT: cmp r3, r1
				; CHECK-THUMB-NEXT: it hs
				; CHECK-NEXT: ldrhs [[DST:r[0-9]+]], [r0]
				; CHECK-THUMB-NEXT: it lo
				; CHECK-NEXT: movlo [[DST]], [[FAILVAL]]
				; check for csdb encoding:
				; CHECK-ARM-NEXT: .inst 0xe320f014
				; CHECK-THUMB-NEXT: .inst.w 0xf3af8014
				}

				define i64 @f_i64(i64* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i64 %failval) {
				entry:
				%0 = tail call i64 @llvm.nospeculateload.i64(i64* %ptr, i8* %lowerbound, i8* %upperbound, i64 %failval, i8* %cmpptr)
				ret i64 %0
				; CHECK-LABEL: f_i64:
				; %failval is passed on the stack, in 2 4-byte slots
				; different variants (e.g. Arm vs Thumb) either load those as 2 ldrs, or 1
				; ldrd. This is too complex to check explicitly here.
				; CHECK: cmp r3, r1
				; CHECK-THUMB-NEXT: it hs
				; CHECK-NEXT: cmphs r2, r3
				; CHECK-THUMB-NEXT: it hi
				; CHECK-NEXT: ldrdhi [[DST1:r[0-9]+\|lr]], [[DST2:r[0-9]+\|lr]], [r0]
				; CHECK-THUMB-NEXT: it ls
				; CHECK-NEXT: movls [[DST1]], [[FAILVAL1:r[0-9]+\|lr]]
				; CHECK-THUMB-NEXT: it ls
				; CHECK-NEXT: movls [[DST2]], [[FAILVAL2:r[0-9]+\|lr]]
				; check for csdb encoding:
				; CHECK-ARM-NEXT: .inst 0xe320f014
				; CHECK-THUMB-NEXT: .inst.w 0xf3af8014
				}

				define i64 @f_i64_nolower(i64* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i64 %failval) {
				entry:
				%0 = tail call i64 @llvm.nospeculateload.nolower.i64(i64* %ptr, i8* %upperbound, i64 %failval, i8* %cmpptr)
				ret i64 %0
				; CHECK-LABEL: f_i64_nolower:
				; %failval is passed on the stack, in 2 4-byte slots
				; different variants (e.g. Arm vs Thumb) either load those as 2 ldrs, or 1
				; ldrd. This is too complex to check explicitly here.
				; CHECK: cmp r3, r2
				; CHECK-THUMB-NEXT: it lo
				; CHECK-NEXT: ldrdlo [[DST1:r[0-9]+\|lr]], [[DST2:r[0-9]+\|lr]], [r0]
				; CHECK-THUMB-NEXT: it hs
				; CHECK-NEXT: movhs [[DST1]], [[FAILVAL1:r[0-9]+\|lr]]
				; CHECK-THUMB-NEXT: it hs
				; CHECK-NEXT: movhs [[DST2]], [[FAILVAL2:r[0-9]+\|lr]]
				; check for csdb encoding:
				; CHECK-ARM-NEXT: .inst 0xe320f014
				; CHECK-THUMB-NEXT: .inst.w 0xf3af8014
				}

				define i64 @f_i64_noupper(i64* %ptr, i8* %lowerbound, i8* %upperbound, i8* %cmpptr, i64 %failval) {
				entry:
				%0 = tail call i64 @llvm.nospeculateload.noupper.i64(i64* %ptr, i8* %lowerbound, i64 %failval, i8* %cmpptr)
				ret i64 %0
				; CHECK-LABEL: f_i64_noupper:
				; %failval is passed on the stack, in 2 4-byte slots
				; different variants (e.g. Arm vs Thumb) either load those as 2 ldrs, or 1
				; ldrd. This is too complex to check explicitly here.
				; CHECK: cmp r3, r1
				; CHECK-THUMB-NEXT: it hs
				; CHECK-NEXT: ldrdhs [[DST1:r[0-9]+\|lr]], [[DST2:r[0-9]+\|lr]], [r0]
				; CHECK-THUMB-NEXT: it lo
				; CHECK-NEXT: movlo [[DST1]], [[FAILVAL1:r[0-9]+\|lr]]
				; CHECK-THUMB-NEXT: it lo
				; CHECK-NEXT: movlo [[DST2]], [[FAILVAL2:r[0-9]+\|lr]]
				; check for csdb encoding:
				; CHECK-ARM-NEXT: .inst 0xe320f014
				; CHECK-THUMB-NEXT: .inst.w 0xf3af8014
				}

This is an archive of the discontinued LLVM Phabricator instance.

Introduce llvm.nospeculateload intrinsicNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 128727

docs/LangRef.rst

include/llvm/CodeGen/ISDOpcodes.h

include/llvm/CodeGen/SelectionDAGNodes.h

include/llvm/IR/Intrinsics.td

include/llvm/Target/TargetSelectionDAG.td

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

lib/CodeGen/SelectionDAG/LegalizeTypes.h

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

lib/Target/AArch64/AArch64AsmPrinter.cpp

lib/Target/AArch64/AArch64ISelLowering.cpp

lib/Target/AArch64/AArch64InstrInfo.cpp

lib/Target/AArch64/AArch64InstrInfo.td

lib/Target/ARM/ARMAsmPrinter.h

lib/Target/ARM/ARMAsmPrinter.cpp

lib/Target/ARM/ARMISelLowering.cpp

lib/Target/ARM/ARMInstrInfo.td

lib/Target/ARM/ARMInstrThumb2.td

test/CodeGen/AArch64/no-speculate.ll

test/CodeGen/ARM/no-speculate.ll

Introduce llvm.nospeculateload intrinsic
Needs ReviewPublic