This is an archive of the discontinued LLVM Phabricator instance.

cfe/trunk/docs/ControlFlowIntegrityDesign.rst
512	Right now the definition of a "function called once" (etc.) would depend on inlining decisions (although with ThinLTO the summary controls which functions are imported, all final inlining decisions are made by individual compilation units). So without changes to this flow you basically would need these steps: thin link thin backends (optimization phase) "second thin link": make global decisions about which functions are called once etc. thin backends (code generation phase) The third step would necessarily be a serialization over all thin backends. In principle, we could change ThinLTO so that all final inlining decisions are made in the thin link phase. Then we would be able to classify functions at thin link time. But I foresee that as being difficult: the code that decides whether inlining is possible is already quite complex: http://llvm-cs.pcc.me.uk/lib/Analysis/InlineCost.cpp#1466 . That code would need to be re-implemented in the thin link to avoid soundness issues. We would also need to be careful about changing the number of call sites for a particular function. At least we would need to prevent the thin backend from duplicating a call site, as that could potentially change the calling convention. So we'd need to attach the `noduplicate` attribute to all call sites for functions deemed to be "called once". If we allow removing call sites, we would need to come up with some scheme to allow the size of the jump table to vary. To me all signs point to this being better implemented in the linker rather than LTO (or as some sort of postprocessing step over the object files produced by LTO).

But I foresee that as being difficult

Yep. That pesky phase-ordering again.

Revision Contents

Path

Size

cfe/

trunk/

docs/

ControlFlowIntegrityDesign.rst

90 lines

Diff 92379

cfe/trunk/docs/ControlFlowIntegrityDesign.rst

	Show First 20 Lines • Show All 492 Lines • ▼ Show 20 Lines
	Position-independent executable requirement			Position-independent executable requirement
	-------------------------------------------			-------------------------------------------

	Cross-DSO CFI mode requires that the main executable is built as PIE.			Cross-DSO CFI mode requires that the main executable is built as PIE.
	In non-PIE executables the address of an external function (taken from			In non-PIE executables the address of an external function (taken from
	the main executable) is the address of that function’s PLT record in			the main executable) is the address of that function’s PLT record in
	the main executable. This would break the CFI checks.			the main executable. This would break the CFI checks.

				Backward-edge CFI for return statements (RCFI)
				==============================================

				This section is a proposal. As of March 2017 it is not implemented.

				Backward-edge control flow (`RET` instructions) can be hijacked
				via overwriting the return address (`RA`) on stack.
				Various mitigation techniques (e.g. `SafeStack`_, `RFG`_, `Intel CET`_)
				try to detect or prevent `RA` corruption on stack.

				RCFI enforces the expected control flow in several different ways described below.
				RCFI heavily relies on LTO.
				pccUnsubmitted Not Done Reply Inline Actions Right now the definition of a "function called once" (etc.) would depend on inlining decisions (although with ThinLTO the summary controls which functions are imported, all final inlining decisions are made by individual compilation units). So without changes to this flow you basically would need these steps: thin link thin backends (optimization phase) "second thin link": make global decisions about which functions are called once etc. thin backends (code generation phase) The third step would necessarily be a serialization over all thin backends. In principle, we could change ThinLTO so that all final inlining decisions are made in the thin link phase. Then we would be able to classify functions at thin link time. But I foresee that as being difficult: the code that decides whether inlining is possible is already quite complex: http://llvm-cs.pcc.me.uk/lib/Analysis/InlineCost.cpp#1466 . That code would need to be re-implemented in the thin link to avoid soundness issues. We would also need to be careful about changing the number of call sites for a particular function. At least we would need to prevent the thin backend from duplicating a call site, as that could potentially change the calling convention. So we'd need to attach the `noduplicate` attribute to all call sites for functions deemed to be "called once". If we allow removing call sites, we would need to come up with some scheme to allow the size of the jump table to vary. To me all signs point to this being better implemented in the linker rather than LTO (or as some sort of postprocessing step over the object files produced by LTO). pcc: Right now the definition of a "function called once" (etc.) would depend on inlining decisions…

				Leaf Functions
				--------------
				If `f()` is a leaf function (i.e. it has no calls
				except maybe no-return calls) it can be called using a special calling convention
				that stores `RA` in a dedicated register `R` before the `CALL` instruction.
				`f()` does not spill `R` and does not use the `RET` instruction,
				instead it uses the value in `R` to `JMP` to `RA`.

				This flavour of CFI is precise, i.e. the function is guaranteed to return
				to the point exactly following the call.

				An alternative approach is to
				copy `RA` from stack to `R` in the first instruction of `f()`,
				then `JMP` to `R`.
				This approach is simpler to implement (does not require changing the caller)
				but weaker (there is a small window when `RA` is actually stored on stack).


				Functions called once
				---------------------
				Suppose `f()` is called in just one place in the program
				(assuming we can verify this in LTO mode).
				In this case we can replace the `RET` instruction with a `JMP` instruction
				with the immediate constant for `RA`.
				This will precisely enforce the return control flow no matter what is stored on stack.

				Another variant is to compare `RA` on stack with the known constant and abort
				if they don't match; then `JMP` to the known constant address.

				Functions called in a small number of call sites
				------------------------------------------------
				We may extend the above approach to cases where `f()`
				is called more than once (but still a small number of times).
				With LTO we know all possible values of `RA` and we check them
				one-by-one (or using binary search) against the value on stack.
				If the match is found, we `JMP` to the known constant address, otherwise abort.

				This protection is near-precise, i.e. it guarantees that the control flow will
				be transferred to one of the valid return addresses for this function,
				but not necessary to the point of the most recent `CALL`.

				General case
				------------
				For functions called multiple times a return jump table is constructed
				in the same manner as jump tables for indirect function calls (see above).
				The correct jump table entry (or it's index) is passed by `CALL` to `f()`
				(as an extra argument) and then spilled to stack.
				The `RET` instruction is replaced with a load of the jump table entry,
				jump table range check, and `JMP` to the jump table entry.

				This protection is also near-precise.

				Returns from functions called indirectly
				----------------------------------------

				If a function is called indirectly, the return jump table is constructed for the
				equivalence class of functions instead of a single function.

				Cross-DSO calls
				---------------
				Consider two instrumented DSOs, `A` and `B`. `A` defines `f()` and `B` calls it.

				This case will be handled similarly to the cross-DSO scheme using the slow path callback.

				Non-goals
				---------

				RCFI does not protect `RET` instructions:
				* in non-instrumented DSOs,
				* in instrumented DSOs for functions that are called from non-instrumented DSOs,
				* embedded into other instructions (e.g. `0f4fc3 cmovg %ebx,%eax`).

				.. _SafeStack: https://clang.llvm.org/docs/SafeStack.html
				.. _RFG: http://xlab.tencent.com/en/2016/11/02/return-flow-guard
				.. _Intel CET: https://software.intel.com/en-us/blogs/2016/06/09/intel-release-new-technology-specifications-protect-rop-attacks

	Hardware support			Hardware support
	================			================

	We believe that the above design can be efficiently implemented in hardware.			We believe that the above design can be efficiently implemented in hardware.
	A single new instruction added to an ISA would allow to perform the CFI check			A single new instruction added to an ISA would allow to perform the forward-edge CFI check
	with fewer bytes per check (smaller code size overhead) and potentially more			with fewer bytes per check (smaller code size overhead) and potentially more
	efficiently. The current software-only instrumentation requires at least			efficiently. The current software-only instrumentation requires at least
	32-bytes per check (on x86_64).			32-bytes per check (on x86_64).
	A hardware instruction may probably be less than ~ 12 bytes.			A hardware instruction may probably be less than ~ 12 bytes.
	Such instruction would check that the argument pointer is in-bounds,			Such instruction would check that the argument pointer is in-bounds,
	and is properly aligned, and if the checks fail it will either trap (in monolithic scheme)			and is properly aligned, and if the checks fail it will either trap (in monolithic scheme)
	or call the slow path function (cross-DSO scheme).			or call the slow path function (cross-DSO scheme).
	The bit vector lookup is probably too complex for a hardware implementation.			The bit vector lookup is probably too complex for a hardware implementation.
	Show All 39 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Proposal: Backward-edge CFI for return statements (RCFI)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 92379

cfe/trunk/docs/ControlFlowIntegrityDesign.rst

Proposal: Backward-edge CFI for return statements (RCFI)
ClosedPublic