Page MenuHomePhabricator

Introduce control flow speculation tracking pass for AArch64.
ClosedPublic

Authored by kristof.beyls on Nov 26 2018, 6:24 AM.

Details

Summary

The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".

At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask of vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:

  • speculative load hardening to automatically mask of data loaded in registers.
  • using intrinsics to mask of data in registers as indicated by the programmer (see https://lwn.net/Articles/759423/).

For AArch64, the following implementation choices have been made in this patch.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.

  • The speculation hardening is done after register allocation. With a relative abundance of registers, one register is reserved (X16) to be the taint register. X16 is expected to not clash with other register reservation mechanisms with very high probability because: . The AArch64 ABI doesn't guarantee X16 to be retained across any call. . There currently isn't even a user interface to reserve X16. If needed, the choice of register to reserve could be made flexible on a per function basis.
  • It is easy to insert mask operations at this late stage as we have mask operations available that don't set flags.
  • The taint variable contains all-ones when no miss-speculation is detected, and contains all-zeros when miss-speculation is detected. Therefore, when masking, an AND instruction (which only changes the register to be masked, no other side effects) can easily be inserted anywhere that's needed.
  • The tracking of miss-speculation is done by using a data-flow conditional select instruction (CSEL) to evaluate the flags that were also used to make conditional branch direction decisions. Speculation of the CSEL instruction can be limited with a CSDB instruction - so the combination of CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL aren't speculated. When conditional branch direction gets miss-speculated, the semantics of the inserted CSEL instruction is such that the taint register will contain all zero bits. One key requirement for this to work is that the conditional branch is followed by an execution of the CSEL instruction, where the CSEL instruction needs to use the same flags status as the conditional branch. This means that the conditional branches must not be implemented as one of the AArch64 conditional branches that do not use the flags as input (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction selectors to not produce these instructions when speculation hardening is enabled. This pass will assert if it does encounter such an instruction.
  • On function call boundaries, the miss-speculation state is transferred from the taint register X16 to be encoded in the SP register as value 0.

Future extensions/improvements could be:

  • Implement this functionality using full speculation barriers, akin to the x86-slh-lfence option. This may be more useful for the intrinsics-based approach than for the SLH approach to masking.
  • No indirect branch misprediction gets protected/instrumented; but this could be done for some indirect branches, such as switch jump tables.

I'm still working on a follow on patch that builds on top of this to introduce
speculative load hardening. Nonetheless, I wanted to get this out already to
start collecting feedback (and split up the functionality into smaller parts to
make it easier to review).
I'm more busy than usual in the coming 2 weeks, so may be a bit slow to react
to review feedback, I'm afraid.

Diff Detail

Repository
rL LLVM

Event Timeline

kristof.beyls created this revision.Nov 26 2018, 6:24 AM

There currently isn't even a user interface to reserve X16.

X16 can be reserved by the user using inline assembly, either with a clobber or named register variable. I can think of a few cases where this might happen in real code:

  • D51432 uses it in libunwind to implement unwinding through functions using return-address signing.
  • Inline assembly which contains a function call will need to clobber X16 and X17.

If moving this pass to before register allocation would add a lot of extra complexity, maybe this could also be solved by copying the taint into SP before each inline asm block, and back out afterwards, like we currently do for calls?

lib/Target/AArch64/AArch64SpeculationHardening.cpp
116 ↗(On Diff #175239)

What does VR stand for here? I'd assume Virtual Register, but these are only ever physical registers.

117 ↗(On Diff #175239)

Unused variable.

test/CodeGen/AArch64/speculation-hardening.ll
11 ↗(On Diff #175239)

Should we also check that these instructions are not emitted when SLH is disabled?

test/CodeGen/AArch64/speculation-hardening.mir
11 ↗(On Diff #175239)

I think it would also be worth testing blocks ending in indirect branches, to make sure we ignore them for now.

  • Addressing most of Oliver's comments.
kristof.beyls marked 4 inline comments as done.Dec 6 2018, 7:26 AM

There currently isn't even a user interface to reserve X16.

X16 can be reserved by the user using inline assembly, either with a clobber or named register variable. I can think of a few cases where this might happen in real code:

D51432 uses it in libunwind to implement unwinding through functions using return-address signing.
Inline assembly which contains a function call will need to clobber X16 and X17.
If moving this pass to before register allocation would add a lot of extra complexity, maybe this could also be solved by copying the taint into SP before each inline asm block, and back out afterwards, like we currently do for calls?

Thanks for pointing out the possibility of X16 and X17 being forced to use by inline assembly code after all.
As we've discussed, we expect very little code to actually go and use X16.

I think there are at least 3 possible solutions to this, each with their set of pros and cons.

  1. Change the implementation so that it runs pre register allocation.
    • Cons:
      • Control flow and memory accesses that may be introduced by later passes (e.g. spill/fill code) will not be protected.
      • The implementation will be a lot more complex.
    • Pros:
      • Implementations of SLH on some other architectures may have no choice but to implement this pre register allocation. If it were implemented pre register allocation for all architectures, there might be slightly more consistency in the protection given between architectures. Maybe.
  1. Go for the suggestion you've made on storing speculation state in the stack pointer before the inline assembly block and restoring it after.
    • Cons:
      • This would need a code sequence to change the value in SP based on the speculation state in X16. The code sequences used at function entry/exit for this cannot be used here, because those clobber X17 and the flags, which may be live when not at a function boundary. On AArch64, there are only a few instructions that can access that stack pointer, which is why we end up using temporary register X17 in the function boundary sequences. I have come up with a code sequence to encode miss-speculation in the SP that doesn't clobber any other register, nor the flags - but only if the encoding of miss-speculation in the stack pointer is allowed to be different: the least significant bit being set to 1. Obviously, this will not work if code uses a byte aligned stack pointer. I'm afraid I don't have a good insight into whether code would ever use a byte aligned stack pointer.
      • The X16 register may be live across multiple basic blocks/control flow. Applying this technique would remove the miss-speculation tracking from the edges on which the X16 register is live before SLH is applied, resulting in silently dropping some protection the user may expect.
    • Pros:
      • This may be a relatively simple way to still protect the very few functions that absolutely must use X16.
  1. Do not apply SLH hardening to a function making hard-coded use of X16, and provide a warning (or error) when a user requests SLH on such a function.
    • Cons:
      • A function will not be protected by SLH, even if it was requested. The warning/error reported will inform the user to either adapt the code in the function to not use X16 or to accept whatever risk there may be in leaving this function unprotected.
    • Pros:
      • A relatively simple implementation, while still keeping the advantages of doing speculation hardening very late (after register allocation).
      • No silent non-protected code.
      • The warning is assumed to trigger to an extremely small amount of code.

Overall, option 3 seems to have the best tradeoffs to me so I'll explore implementing that option further.

This updated diff addresses Oliver's final comment, namely about what to do when the program uses X16, e.g because of use of inline assembly specifically requesting to use X16.
Besides all the options I came up with previously, there is a 4th option: prevent speculation by inserting DSB SYS/ISB instruction pairs. Conceptually this is not unlike using lfence in X86SpeculativeLoadHardening.
I believe the pros/cons of this approach are (compared to the 3 options I listed previously):
Pros:

  • A relatively simple implementation.
  • Still keeping the advantages of doing speculation hardening very late.
  • No silent non-protected code.
  • This alternative protection mechanism is expected to only be needed for a very small amount of code.

Cons:

  • This likely has higher overhead than if we could still use a data flow mechanism to track when control flow miss-speculation happens.

Overall, this method seems to offer the best trade-off out of all options listed.

olista01 accepted this revision.Dec 17 2018, 8:47 AM

LGTM.

This revision is now accepted and ready to land.Dec 17 2018, 8:47 AM
This revision was automatically updated to reflect the committed changes.
zbrid added a comment.Jan 11 2019, 4:02 PM

Hi Kristof,
Sorry about this late review. I wanted to make sure that I understood your implementation of SLH for ARM, so I took a look and made a few comments with some questions on this first patch. Let me know your thoughts. I'm planning to take a look at the other commit you made a few days ago related to SLH too. Hopefully I didn't make too many comments that were already addressed in the follow up commit.

llvm/trunk/lib/Target/AArch64/AArch64SpeculationHardening.cpp
100

Why is this a macro instead of a const variable?

164–166

Just curious in general about this, are there cases where this is expected to happen in practice? It seems useless to have a branch that goes to the same place for both true and false.

172

How does this assert work? I don't understand how you're ANDing with the string

"unknown Cond array format"
191

From the comments at the beginning of the file, I expected a CSDB instruction right after this instruction. Is that not needed or added somewhere else (like the follow up commit that adds the masking)?

192

What is a live in and why does this one get added?

222

Is this getting the instruction before the branch instruction? Then that instruction is used as the debug location for the FBB and TBB we got in lines 214/215? Did I understand this correctly?

233

Why is this code in a separate code block?

262

Why doesn't this function return bool to indicate whether instructions were modified?

269–270

Why do these instructions have an immediate operand? It looks like you're passing the default for ISB, but it's not clear to me what that number means for DSB. I checked this manual, but it doesn't say what the immediate operand is used for, only that it's limited in the value it can have:

274

Is having the zero register in ARM as the destination register equivalent to discarding the output or is the zero register affected?

If it's the former, why use CMP instead of SUBS if this instruction discards the result of the instruction in the same way that CMP does?

280

Why use CSINV instead of CSETM here?

297

Why use ADD instead of MOV?

298–314

Why not the following instead of the three above instructions?

AND SP, SP, TaintReg
338–339

Why do you have to get this data every time the function is called? Could these be static since we wouldn't expect them to be different between the functions in the same compilation? Or are there cases where this would be different between two functions?

efriedma added inline comments.
llvm/trunk/lib/Target/AArch64/AArch64SpeculationHardening.cpp
269–270

The complete architecture reference manual is publicly available at https://developer.arm.com/docs/ddi0487/latest . This should allow you to quickly answer all your questions about the instruction choices.

274

CMP is an alias.

280

CSINV is an alias.

297

MOV is an alias.

298–314

There is no such encoding.

338–339

These can change between functions (with LTO, or certain function attributes).

kristof.beyls marked 30 inline comments as done.Jan 14 2019, 2:53 AM

Hi Kristof,
Sorry about this late review. I wanted to make sure that I understood your implementation of SLH for ARM, so I took a look and made a few comments with some questions on this first patch. Let me know your thoughts. I'm planning to take a look at the other commit you made a few days ago related to SLH too. Hopefully I didn't make too many comments that were already addressed in the follow up commit.

Hi Zola,

I am very happy with you taking a look and asking those questions! Thank you!
I have tried to give clear answers to your questions. Please don't hold back in asking further if anything remains unclear.

llvm/trunk/lib/Target/AArch64/AArch64SpeculationHardening.cpp
100

I just followed existing practice in the other LLVM passes. Maybe this could be a const variable - not sure. If so, maybe best to make the change in a separate patch with a focus on making this change for all existing passes?

164–166

This pass has seen quite a few iterations over the past couple of months. I'm sure I have seen examples of TBB == FBB in some cases while running the pass across a range of test codes, but I'm afraid I no longer remember where I've seen the example...

172

This is also an idiom used in LLVM. Anding with the string results in some documentation to be printed (the string) when the assert fires.
The string in the assert is often helpful as the actual condition check may be a bit cryptic if you hit an assert you haven't written yourself recently.
For more details, see https://llvm.org/docs/CodingStandards.html#assert-liberally.

191

The CSDB instruction is needed before the register containing the taint is actually used.
In this patch, no uses of the register containing the taint are introduced - that is done in the follow-on patch.
So, indeed, you'll see CSDB instructions being inserted as part of the follow-on patch - where uses of the taint register also get introduced.

192

The LiveIns record which registers are live at the start of the basic block.
Here, the CSEL instruction inserted at the start of the basic block uses (implicitly) the flags AArch64::NZCV. Those flags were set by the previous basic block, where the conditional jump lives that jumps to the current basic block. Therefore, here we have to explicitly mark that the flags are live at the start of the basic block.

222

instr_end() gets the "one past the last instruction" in the basic block, so "--MBB.instr_end()" gets the last instruction in the basic block MBB.
So, the instruction that is used as the debug location is the last instruction.
Presumably that last instruction will be the conditional branch instruction, so the tracking code (CSEL) will get the same debug info as the conditional branch instruction for which it's tracking miss-speculation.

233

There isn't really a need for it - it's just a personal style where I've used the block to indicate the conceptual extent of the "perform correct code generation around function calls and before returns".
I'll remove the nesting level in an update - there is indeed no need for it and it deviates from existing practice.

262

This function always modifies MBB - so there is no need to signal back to the caller whether instructions were modified.

269–270

As Eli pointed to - you should be able to find the information in the location he pointed to.
For example, in section "C6.2.75 DSB", the details state "...SY ... Encoded as CRm = 0b1111."

274

The xzr register exists in AArch64 ("64-bit Arm"), but not in ARM ("32-bit Arm").
You're understanding is otherwise correct. Let me quote the ArmARM ("Arm Architecture Reference Manual"):
"The name XZR represents the zero register for 64-bit operands where an encoding of the value 31 in the
corresponding register field is interpreted as returning zero when read or discarding the result when written."

A little bit of background on instruction aliases: quite a few "instructions" you write in assembly, like CMP are actually encoded as more general instructions - in this case SUBS.
So, the "CMP" instruction only really exists at the assembly level (for convenience) - but once encoded it is just a SUBS instruction.
To quote the ArmARM again on the CMP instruction:
"""
CMP <Xn|SP>, #<imm>{, <shift>}
is equivalent to
SUBS XZR, <Xn|SP>, #<imm> {, <shift>}
and is always the preferred disassembly.
"""

In MachineInstrs, these aliases are not available, only the "real" encodable instructions. So when creating an instruction, you need to use SUBS rather than CMP.
In the comments, I have tried to put in the mapping from the alias to the encodable instruction, as I personally find it handy.

I hope that makes sense?

298–314

As Eli said.
More general, only the following instructions can write to the SP register in AArch64: ADD, SUB, NEG, NEGS, ADDS, SUBS.
Therefore, to be able to AND the SP with another register and write that value to SP, we first need to move the SP to another register.