Index: docs/ControlFlowIntegrityDesign.rst =================================================================== --- docs/ControlFlowIntegrityDesign.rst +++ docs/ControlFlowIntegrityDesign.rst @@ -498,12 +498,100 @@ the main executable) is the address of that function’s PLT record in the main executable. This would break the CFI checks. +Backward-edge CFI for return statements (RCFI) +============================================== + +This section is a proposal. As of March 2017 it is not implemented. + +Backward-edge control flow (`RET` instructions) can be hijacked +via overwriting the return address (`RA`) on stack. +Various mitigation techniques (e.g. `SafeStack`_, `RFG`_, `Intel CET`_) +try to detect or prevent `RA` corruption on stack. + +RCFI enforces the expected control flow in several different ways described below. +RCFI heavily relies on LTO. + +Leaf Functions +-------------- +If `f()` is a leaf function (i.e. it has no calls +except maybe no-return calls) it can be called using a special calling convention +that stores `RA` in a dedicated register `R` before the `CALL` instruction. +`f()` does not spill `R` and does not use the `RET` instruction, +instead it uses the value in `R` to `JMP` to `RA`. + +This flavour of CFI is *precise*, i.e. the function is guaranteed to return +to the point exactly following the call. + +And alternative approach is to +copy `RA` from stack to `R` in the first instruction of `f()`, +then `JMP` to `R`. +This approach is simpler to implement (does not require changing the caller) +but weaker (there is a small window when `RA` is actually stored on stack). + + +Functions called once +--------------------- +Suppose `f()` is called in just one place in the program +(assuming we can verify this in LTO mode). +In this case we can replace the `RET` instruction with a `JMP` instruction +with the immediate constant for `RA`. +This will *precisely* enforce the return control flow no matter what is stored on stack. + +Another variant is to compare `RA` on stack with the known constant and abort +if they don't match; then `JMP` to the known constant address. + +Functions called in a small number of call sites +------------------------------------------------ +We may extend the above approach to cases where `f()` +is called more than once (but still a small number of times). +With LTO we know all possible values of `RA` and we check them +one-by-one (or using binary search) against the value on stack. +If the match is found, we `JMP` to the known constant address, otherwise abort. + +This protection is *near-precise*, i.e. it guarantees that the control flow will +be transferred to one of the valid return addresses for this function, +but not necessary to the point of the most recent `CALL`. + +General case +------------ +For functions called multiple times a *return jump table* is constructed +in the same manner as jump tables for indirect function calls (see above). +The correct jump table entry (or it's index) is passed by `CALL` to `f()` +(as an extra argument) and then spilled to stack. +The `RET` instruction is replaced with a load of the jump table entry, +jump table range check, and `JMP` to the jump table entry. + +This protection is also *near-precise*. + +Returns from functions called indirectly +---------------------------------------- + +If a function is called indirectly, the return jump table is constructed for the +equivalence class of functions instead of a single function. + +Cross-DSO calls +--------------- +Consider two instrumented DSOs, `A` and `B`. `A` defines `f()` and `B` calls it. + +This case will be handled similarly to the cross-DSO scheme using the slow path callback. + +Non-goals +--------- + +RCFI does not protect `RET` instructions: + * in non-instrumented DSOs, + * in instrumented DSOs for functions that are called from non-instrumented DSOs, + * embedded into other instructions (e.g. `0f4fc3 cmovg %ebx,%eax`). + +.. _SafeStack: https://clang.llvm.org/docs/SafeStack.html +.. _RFG: http://xlab.tencent.com/en/2016/11/02/return-flow-guard +.. _Intel CET: https://software.intel.com/en-us/blogs/2016/06/09/intel-release-new-technology-specifications-protect-rop-attacks Hardware support ================ We believe that the above design can be efficiently implemented in hardware. -A single new instruction added to an ISA would allow to perform the CFI check +A single new instruction added to an ISA would allow to perform the forward-edge CFI check with fewer bytes per check (smaller code size overhead) and potentially more efficiently. The current software-only instrumentation requires at least 32-bytes per check (on x86_64).