diff --git a/llvm/docs/SpeculativeExecutionSideEffectSuppression.md b/llvm/docs/SpeculativeExecutionSideEffectSuppression.md new file mode 100644 --- /dev/null +++ b/llvm/docs/SpeculativeExecutionSideEffectSuppression.md @@ -0,0 +1,554 @@ +# Speculative Execution Side Effect Suppression + +## A Load Value Injection Mitigation Technique + +Author: Zola Bridges - zbrid@google.com + +## Overview of the document + +This document describes the performance impact of a mitigation technique +(Speculative Execution Side Effect Suppression or SESES) when applied to code +such as that running in a secure enclave using Intel's SGX. In those +environments, a more privileged attacker may cause speculative execution of the +code in the enclave to inadvertently leak data from the secure enclave memory +region through side channels that can be read outside the enclave by the +attacker. This would bypass the fundamental security model of the secure enclave +by exposing its private, encrypted memory to the more privileged adversary +running on the same machine. Also, the specific attacks we are concerned with +(LVI or Load Value Injection) don't have any more targeted mitigation readily +understood. Instead, we are using a mitigation of *any* form of speculative side +channel we can identify. + +## Overview of the mitigation + +As the name suggests, the "speculative execution side effect suppression" +mitigation aims to prevent any effects of speculative execution from escaping +into the microarchitectural domain where they could be observed, thereby closing +off side channel information leaks. This was built as a "big hammer." It is +intended to protect against many side channel vulnerabilities (Spectre v1, +Spectre v4, LVI, etc) even though it was built in response to LVI. + +In the case of LVI, we assume that speculative loads from memory (due to explicit +memory access instructions or control flow instructions like RET) may receive +injected data due to address aliasing, and we ensure these injected values are +not allowed to steer later speculative memory accesses to impact cache contents. + +The mitigation is implemented as a compiler pass that inserts a speculation +barrier (LFENCE) just before: +* Each memory read instruction +* Each memory write instruction +* The first branch instruction in a group of terminators at the end of a basic +block + +This is something of a last-resort mitigation: it is expected to have extreme +performance implications and it may not be a complete mitigation because it +relies on enumerating specific side channel mechanisms. However, it is +applicable to more variants and styles of gadgets that can reach speculative +execution side channels than just traditional Spectre Variant 1 gadgets which +speculative load hardening (SLH) targets much more narrowly but more +efficiently. + +While there is a slight risk that this mitigation will be ineffective against +future side channels, we believe there is still significant value in closing two +side channel classes that are most actively exploited today: control-flow based +(branch predictor or icache) and cache timing. Control flow side channels are +closed by preventing speculative execution into conditionals and indirect +branches. Cache timing side channels are closed by preventing speculative +execution of reads and writes. + +We believe this mitigation will be most useful in situations where code is +handling extremely sensitive secrets that must not leak, and where a substantial +hit to performance is tolerable in service of that overriding goal. As we've +mentioned, the original target of this mitigation was the threat of LVI against SGX +enclaves instrumenting critically important secrets. + +Credit: The ideas implemented in this mitigation come from folks at Intel, +Chandler Carruth, and others. + +### Usage + +The flags needed to run the pass once the patch is applied to LLVM is: + +`-mllvm -x86-seses-enable` + +## Performance impact on BoringSSL benchmarks + +### Assumptions + +* SGX workloads do cryptography thus Boring SSL is a representative workload. +* We are most worried about this vulnerability affecting compute workloads. + +### Results + +When I compared the performance of one run of the benchmarks with the mitigation +enabled to one run of the benchmark with the mitigation disabled, I found the +following: + +The highest impact the mitigation had was making the mitigated computation +95.65% slower than the unmitigated computation. This slow down is seen during +the RSA 4096 signing operations. This is a high impact which is expected given +the heavy weight of the mitigation technique. The lowest impact seen was during +the SHA-1 (8192 bytes) operations which was 84.72% slower when mitigated. + +Lowest change in the operations per second when mitigated +-84.72% +Highest change in the operations per second when mitigated +-95.65% +Number of operations with a decrease in performance by 90% or greater +68 +Most slow down (ie X times slower) +23.00 +Least slow down +6.55 +Average slow down +13.48 +Total Operations Benchmarked +75 + +All of the above is reflected in the two additional runs of each benchmark that +I ran. The raw data can be found in the spreadsheet listed in the appendix. + +## Examples of generated code + +### Code snippet from BoringSSL + +``` +... +00000000000d3820 : + d3820: 55 push %rbp + d3821: 48 89 e5 mov %rsp,%rbp + d3824: 41 57 push %r15 + d3826: 41 56 push %r14 + d3828: 41 55 push %r13 + d382a: 41 54 push %r12 + d382c: 53 push %rbx + d382d: 50 push %rax + d382e: 4d 89 c4 mov %r8,%r12 + d3831: 49 89 ce mov %rcx,%r14 + d3834: 49 89 d7 mov %rdx,%r15 + d3837: 48 89 f3 mov %rsi,%rbx + d383a: 0f ae e8 lfence # Fence before mem read/write + d383d: 4c 8b 6d 20 mov 0x20(%rbp),%r13 + d3841: 0f ae e8 lfence # Fence before mem read/write + d3844: 48 8b 45 18 mov 0x18(%rbp),%rax + d3848: 4a 8d 0c 28 lea (%rax,%r13,1),%rcx + d384c: 4a 8d 14 2e lea (%rsi,%r13,1),%rdx + d3850: 48 39 f0 cmp %rsi,%rax + d3853: 0f ae e8 lfence # Fence before branch instructions + d3856: 74 10 je d3868 + d3858: 48 39 d9 cmp %rbx,%rcx + d385b: 0f ae e8 lfence # Fence before branch + d385e: 76 08 jbe d3868 +... +``` + +## Optimization of SESES for Load Value Injection Part 1 + +This section and the following section, Optimization of SESES for Load Value +Injection Part 2, are about performance optimizations that limit the number of +LFENCEs that need to be added while maintaining the security in the face of Load +Value Injection, in particular. This is for informational purposes only. This +document defers to Intel for all guidance on the available mitigations and the +security of this mitigation. + +### Exploration of performance overhead + +The code which results from the following experiments is not necessarily secure. +These experiments were run to begin to separate the performance overhead of the +various parts of this mitigation. The data collected will guide subsequent +optimization efforts. More experiments are on-going as we look at more targeted +identification of vulnerable gadgets. + +### Experiment: only one LFENCE per basic block + +#### Description + +I created a modified version of the mitigation that excludes all LFENCEs except +the first LFENCE in a basic block. The benchmark results estimate the overhead +that comes from the excluded LFENCEs. + +#### Flags +-mllvm -x86-seses-only-first-lfence + +#### Results + +Excluding all but the first LFENCE in a basic block removed 69.8% of the LFENCEs +from the full mitigation. Slowdown is 1.07x-5.07x across scenarios (mean 2.47), +vs. 6.55x-23.03x (mean 13.48x) for the full mitigation. + +This is promising. If there are more fine grained mitigations that can be used +for the vulnerabilities within a basic block that aren't covered by the first +LFENCE we place that are less expensive, then we may be able to lower the +overall performance impact of this mitigation. + +Total LFENCEs in BoringSSL Binary with Full Mitigation Applied +144,683 +Total LFENCES in BoringSSL Binary with this Experiment Applied +43,739 +Percentage of LFENCEs Excluded by Modification +69.8% + +Modified Mitigation vs Baseline +Lowest change in the operations per second when mitigated +-6.94% +Highest change in the operations per second when mitigated +-80.28% +Number of operations with a decrease in performance by 90% or greater +0 +Most slow down (ie X times slower) +5.07 +Least slow down +1.07 +Average slow down +2.47 +Total Operations Benchmarked +75 + +Fully Mitigated vs Baseline +Lowest change in the operations per second when mitigated +-84.73% +Highest change in the operations per second when mitigated +-95.66% +Number of operations with a decrease in performance by 90% or greater +68 +Most slow down (ie X times slower) +23.03 +Least slow down +6.55 +Average slow down +13.48 +Total Operations Benchmarked +75 + +### Experiment: no LFENCE for constant or RIP-relative branches + +#### Description + +I created a modified version of the mitigation which adds fewer LFENCEs before +branch instructions by excluding LFENCEs before branch instructions that read +from constant addresses. A constant address is considered one where all the +inputs are non-registers or the RIP register. Note modification of the EFLAGS +register is considered a non constant addressing mode as we want to prevent +leaking of EFLAGS data, so all JCC instructions are expected to be LFENCEd. +LFENCEs before memory reads and writes were unmodified. + +#### Flags + +-mllvm -x86-seses-only-lfence-non-const + +#### Results + +This would be useful to implement in the final mitigation, but only provides a +relatively small performance improvement. This experiment is unique among the +three mentioned in this section because this may still maintain the integrity of +the mitigation. If the binary that results from this modification is indeed +still completely mitigated, then it would be worth making this modification a +default option rather than keeping it behind a flag. + +Total LFENCEs in BoringSSL Binary with Full Mitigation Applied +144,683 +Total LFENCES in BoringSSL Binary with this Experiment Applied +138,947 +Percentage of LFENCEs Excluded by Modification +4.0% + +Modified Mitigation vs Baseline +Lowest change in the operations per second when mitigated +-84.73% +Highest change in the operations per second when mitigated +-95.53% +Number of operations with a decrease in performance by 90% or greater +68 +Most slow down (ie X times slower) +22.39 +Least slow down +6.55 +Average slow down +13.41 +Total Operations Benchmarked +75 + +Fully Mitigated vs Baseline +Lowest change in the operations per second when mitigated +-84.73% +Highest change in the operations per second when mitigated +-95.66% +Number of operations with a decrease in performance by 90% or greater +68 +Most slow down (ie X times slower) +23.03 +Least slow down +6.55 +Average slow down +13.48 +Total Operations Benchmarked +75 + +### Experiment: omit LFENCEs for branches + +#### Description + +I created a modified version of the mitigation which does not LFENCEs before +branch instructions. LFENCEs before memory reads and writes were unmodified. + +#### Flags + +-mllvm -x86-seses-omit-branch-lfences + +#### Results + +For some operations, it seems likely that limiting the number of branches that +are LFENCEd may result in a performance improvement. + +Total LFENCEs in BoringSSL Binary with Full Mitigation Applied +144,683 +Total LFENCES in BoringSSL Binary with this Experiment Applied +109,568 +Percentage of LFENCEs Excluded by Modification +24.3% + +Modified Mitigation vs Baseline +Lowest change in the operations per second when mitigated +-84.64% +Highest change in the operations per second when mitigated +-95.16% +Number of operations with a decrease in performance by 90% or greater +64 +Most slow down (ie X times slower) +20.67 +Least slow down +6.51 +Average slow down +12.73 +Total Operations Benchmarked +75 + +Fully Mitigated vs Baseline +Lowest change in the operations per second when mitigated +-84.73% +Highest change in the operations per second when mitigated +-95.66% +Number of operations with a decrease in performance by 90% or greater +68 +Most slow down (ie X times slower) +23.03 +Least slow down +6.55 +Average slow down +13.48 +Total Operations Benchmarked +75 + +## Reproducing these results + +### Running the benchmarks + +For each experiment with the benchmarks, I ran three iterations without +collecting results, then three iterations while collecting results. + +The invocation to run the benchmarks is: +bssl -- speed + +#### Hardware + +All trials ran on an Intel Xeon processor with Skylake (SKX) microarchitecture. + +#### Build Information + +The LLVM baseline revision that I applied the SESES patches against when running the +benchmark follows. +LLVM Git commit ID: 25f64629761f583324c716aab319cf6298aed45b + +Both the mitigated binary and the non-mitigated binary were built with +optimization enabled, with assert calls disabled, and without debugging +information. + +##### Glossary of flags +`-DOPENSSL_NO_ASM` +This causes all handwritten assembly to be excluded from the build and causes +fallback code written in C to be used instead. This is to avoid unmitigated +assembly in the binary. + +`-DBORINGSSL_FIPS=0` +This flag turns off FIPS mode which has errors when you try to build with +-DOPENSSL_NO_ASM. + +##### Flags for building with the mitigation + +``` +-mllvm -x86-speculative-execution-side-effect-suppression -O3 -DBORINGSSL_FIPS=0 +-DOPENSSL_NO_ASM -DNDEBUG +``` + +##### Flags for building without the mitigation + +``` +-O3 -DBORINGSSL_FIPS=0 -DOPENSSL_NO_ASM -DNDEBUG +``` + +## Optimization of SESES for Load Value Injection Part 2 + +### Experiment: Omit LFENCE in basic blocks without any loads + +#### Description + +This variant does not add any LFENCEs to basic blocks without any loads. This is +believed to be a secure variant when used with the full mitigation technique. +The benchmark estimates the overhead that comes from LFENCEing stores and +conditional branches in basic blocks without any loads. + +#### Flags + +-mllvm -x86-seses-do-not-lfence-bb-without-loads + +#### Results + +The changes in performance for cryptographic operations in BoringSSL with this +experiment enabled compared to an unmitigated baseline follows. + +Modified Mitigation vs Baseline +Geometric mean +0.073 +Minimum +0.041 +Quartile 1 +0.060 +Median +0.066 +Quartile 3 +0.081 +Maximum +0.234 + +Fully Mitigated vs Baseline +Geometric mean +0.071 +Minimum +0.041 +Quartile 1 +0.060 +Median +0.063 +Quartile 3 +0.077 +Maximum +0.230 + +### Experiment: Omit LFENCE in basic blocks with one load and no stores + +#### Description + +This variant doesn't add an LFENCE in basic blocks with one load and no stores. +The benchmark estimates the overhead that comes from the LFENCEs placed on the +single load and conditional branches in basic blocks with a single load and no +stores. + +#### Flags +-mllvm -x86-seses-one-load-no-stores + +#### Results + +The changes in performance for cryptographic operations in BoringSSL with this +experiment enabled compared to an unmitigated baseline follows. + +Modified Mitigation vs Baseline +Geometric mean +0.073 +Minimum +0.041 +Quartile 1 +0.060 +Median +0.064 +Quartile 3 +0.081 +Maximum +0.232 + +Fully Mitigated vs Baseline +Geometric mean +0.071 +Minimum +0.041 +Quartile 1 +0.060 +Median +0.063 +Quartile 3 +0.077 +Maximum +0.230 + +### Experiment: Don't LFENCE before instructions that are data invariant + +#### Description + +This variant does not LFENCE before instructions that are data invariant. The +list of data invariant instructions used for this purpose is not complete. +Notably, vector instructions, which are used in BoringSSL, are not listed. Given +that this variant appears secure when used alongside the full mitigation in all +non-data invariant cases, it may be worthwhile to deploy this variant by default +while also updating the list to include vector instructions. There is data on +the usage on vector instructions in BoringSSL when compiling without inline +assembly, if requested. + +The benchmark estimates the overhead of LFENCEing instructions that are data +invariant with caveats given above. + +#### Flags + +-mllvm -x86-seses-lfence-data-invariant=false + +#### Results + +The changes in performance for cryptographic operations in BoringSSL with this +experiment enabled compared to an unmitigated baseline follows. + +Modified Mitigation vs Baseline +Geometric mean +0.129 +Minimum +0.058 +Quartile 1 +0.104 +Median +0.112 +Quartile 3 +0.139 +Maximum +0.459 + +Fully Mitigated vs Baseline +Geometric mean +0.071 +Minimum +0.041 +Quartile 1 +0.060 +Median +0.063 +Quartile 3 +0.077 +Maximum +0.230 + +## Reproducing these results + +### Running the benchmarks + +For each experiment with the benchmarks in Part 2, I ran 20 iterations while +collecting results. + +#### Hardware + +All trials ran on an Intel Xeon processor with Skylake (SKX) microarchitecture. + +#### Build Information + +The LLVM baseline revision that I applied the SESES patches against when running the +benchmark follows. +LLVM Git commit ID: be4704bd41a4dd8bb5c4dd5a614744c69fb3cf8e + +Both the mitigated binary and the non-mitigated binary were built with +optimization enabled, with assert calls disabled, and without debugging +information.