Index: cfe/trunk/docs/ShadowCallStack.rst =================================================================== --- cfe/trunk/docs/ShadowCallStack.rst +++ cfe/trunk/docs/ShadowCallStack.rst @@ -8,28 +8,45 @@ Introduction ============ -ShadowCallStack is an **experimental** instrumentation pass, currently only -implemented for x86_64 and aarch64, that protects programs against return -address overwrites (e.g. stack buffer overflows.) It works by saving a -function's return address to a separately allocated 'shadow call stack' -in the function prolog and checking the return address on the stack against -the shadow call stack in the function epilog. +ShadowCallStack is an instrumentation pass, currently only implemented for +aarch64 and x86_64, that protects programs against return address overwrites +(e.g. stack buffer overflows.) It works by saving a function's return address +to a separately allocated 'shadow call stack' in the function prolog in +non-leaf functions and loading the return address from the shadow call stack +in the function epilog. The return address is also stored on the regular stack +for compatibility with unwinders, but is otherwise unused. + +The aarch64 implementation is considered production ready, and +an `implementation of the runtime`_ has been added to Android's libc +(bionic). The x86_64 implementation was evaluated using Chromium and was +found to have critical performance and security deficiencies, and may be +removed in a future release of the compiler. This document only describes +the aarch64 implementation; details on the x86_64 implementation are found +in the `Clang 7.0.1 documentation`_. + +.. _`implementation of the runtime`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/bionic/pthread_create.cpp#128 +.. _`Clang 7.0.1 documentation`: https://releases.llvm.org/7.0.1/tools/clang/docs/ShadowCallStack.html Comparison ---------- -To optimize for memory consumption and cache locality, the shadow call stack -stores an index followed by an array of return addresses. This is in contrast -to other schemes, like :doc:`SafeStack`, that mirror the entire stack and -trade-off consuming more memory for shorter function prologs and epilogs with -fewer memory accesses. Similarly, `Return Flow Guard`_ consumes more memory with -shorter function prologs and epilogs than ShadowCallStack but suffers from the -same race conditions (see `Security`_). Intel `Control-flow Enforcement Technology`_ -(CET) is a proposed hardware extension that would add native support to -use a shadow stack to store/check return addresses at call/return time. It -would not suffer from race conditions at calls and returns and not incur the -overhead of function instrumentation, but it does require operating system -support. +To optimize for memory consumption and cache locality, the shadow call +stack stores only an array of return addresses. This is in contrast to other +schemes, like :doc:`SafeStack`, that mirror the entire stack and trade-off +consuming more memory for shorter function prologs and epilogs with fewer +memory accesses. + +`Return Flow Guard`_ is a pure software implementation of shadow call stacks +on x86_64. It is similar to the ShadowCallStack x86_64 implementation but +trades off higher memory usage for a shorter prologue and epilogue. Like +x86_64 ShadowCallStack, it is inherently racy due to the architecture's use +of the stack for calls and returns. + +Intel `Control-flow Enforcement Technology`_ (CET) is a proposed hardware +extension that would add native support to use a shadow stack to store/check +return addresses at call/return time. Being a hardware implementation, it +would not suffer from race conditions and would not incur the overhead of +function instrumentation, but it does require operating system support. .. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/ .. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf @@ -37,57 +54,96 @@ Compatibility ------------- -ShadowCallStack currently only supports x86_64 and aarch64. A runtime is not -currently provided in compiler-rt so one must be provided by the compiled -application. - -On aarch64, the instrumentation makes use of the platform register ``x18``. -On some platforms, ``x18`` is reserved, and on others, it is designated as -a scratch register. This generally means that any code that may run on the -same thread as code compiled with ShadowCallStack must either target one -of the platforms whose ABI reserves ``x18`` (currently Darwin, Fuchsia and -Windows) or be compiled with the flag ``-ffixed-x18``. +A runtime is not provided in compiler-rt so one must be provided by the +compiled application or the operating system. Integrating the runtime into +the operating system should be preferred since otherwise all thread creation +and destruction would need to be intercepted by the application. + +The instrumentation makes use of the platform register ``x18``. On some +platforms, ``x18`` is reserved, and on others, it is designated as a scratch +register. This generally means that any code that may run on the same thread +as code compiled with ShadowCallStack must either target one of the platforms +whose ABI reserves ``x18`` (currently Android, Darwin, Fuchsia and Windows) +or be compiled with the flag ``-ffixed-x18``. If absolutely necessary, code +compiled without ``-ffixed-x18`` may be run on the same thread as code that +uses ShadowCallStack by saving the register value temporarily on the stack +(`example in Android`_) but this should be done with care since it risks +leaking the shadow call stack address. + +.. _`example in Android`: https://android-review.googlesource.com/c/platform/frameworks/base/+/803717 + +Because of the use of register ``x18``, the ShadowCallStack feature is +incompatible with any other feature that may use ``x18``. However, there +is no inherent reason why ShadowCallStack needs to use register ``x18`` +specifically; in principle, a platform could choose to reserve and use another +register for ShadowCallStack, but this would be incompatible with the AAPCS64. + +Special unwind information is required on functions that are compiled +with ShadowCallStack and that may be unwound, i.e. functions compiled with +``-fexceptions`` (which is the default in C++). Some unwinders (such as the +libgcc 4.9 unwinder) do not understand this unwind info and will segfault +when encountering it. LLVM libunwind processes this unwind info correctly, +however. This means that if exceptions are used together with ShadowCallStack, +the program must use a compatible unwinder. Security ======== ShadowCallStack is intended to be a stronger alternative to ``-fstack-protector``. It protects from non-linear overflows and arbitrary -memory writes to the return address slot; however, similarly to -``-fstack-protector`` this protection suffers from race conditions because of -the call-return semantics on x86_64. There is a short race between the call -instruction and the first instruction in the function that reads the return -address where an attacker could overwrite the return address and bypass -ShadowCallStack. Similarly, there is a time-of-check-to-time-of-use race in the -function epilog where an attacker could overwrite the return address after it -has been checked and before it has been returned to. Modifying the call-return -semantics to fix this on x86_64 would incur an unacceptable performance overhead -due to return branch prediction. - -The instrumentation makes use of the ``gs`` segment register on x86_64, -or the ``x18`` register on aarch64, to reference the shadow call stack -meaning that references to the shadow call stack do not have to be stored in -memory. This makes it possible to implement a runtime that avoids exposing -the address of the shadow call stack to attackers that can read arbitrary -memory. However, attackers could still try to exploit side channels exposed -by the operating system `[1]`_ `[2]`_ or processor `[3]`_ to discover the -address of the shadow call stack. +memory writes to the return address slot. + +The instrumentation makes use of the ``x18`` register to reference the shadow +call stack, meaning that references to the shadow call stack do not have +to be stored in memory. This makes it possible to implement a runtime that +avoids exposing the address of the shadow call stack to attackers that can +read arbitrary memory. However, attackers could still try to exploit side +channels exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_ +to discover the address of the shadow call stack. .. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/ .. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf .. _`[3]`: https://www.vusec.net/projects/anc/ -On x86_64, leaf functions are optimized to store the return address in a -free register and avoid writing to the shadow call stack if a register is -available. Very short leaf functions are uninstrumented if their execution -is judged to be shorter than the race condition window intrinsic to the -instrumentation. - -On aarch64, the architecture's call and return instructions (``bl`` and -``ret``) operate on a register rather than the stack, which means that -leaf functions are generally protected from return address overwrites even -without ShadowCallStack. It also means that ShadowCallStack on aarch64 is not -vulnerable to the same types of time-of-check-to-time-of-use races as x86_64. +Unless care is taken when allocating the shadow call stack, it may be +possible for an attacker to guess its address using the addresses of +other allocations. Therefore, the address should be chosen to make this +difficult. One way to do this is to allocate a large guard region without +read/write permissions, randomly select a small region within it to be +used as the address of the shadow call stack and mark only that region as +read/write. This also mitigates somewhat against processor side channels. +The intent is that the Android runtime `will do this`_, but the platform will +first need to be `changed`_ to avoid using ``setrlimit(RLIMIT_AS)`` to limit +memory allocations in certain processes, as this also limits the number of +guard regions that can be allocated. + +.. _`will do this`: https://android-review.googlesource.com/c/platform/bionic/+/891622 +.. _`changed`: https://android-review.googlesource.com/c/platform/frameworks/av/+/837745 + +The runtime will need the address of the shadow call stack in order to +deallocate it when destroying the thread. If the entire program is compiled +with ``-ffixed-x18``, this is trivial: the address can be derived from the +value stored in ``x18`` (e.g. by masking out the lower bits). If a guard +region is used, the address of the start of the guard region could then be +stored at the start of the shadow call stack itself. But if it is possible +for code compiled without ``-ffixed-x18`` to run on a thread managed by the +runtime, which is the case on Android for example, the address must be stored +somewhere else instead. On Android we store the address of the start of the +guard region in TLS and deallocate the entire guard region including the +shadow call stack at thread exit. This is considered acceptable given that +the address of the start of the guard region is already somewhat guessable. + +One way in which the address of the shadow call stack could leak is in the +``jmp_buf`` data structure used by ``setjmp`` and ``longjmp``. The Android +runtime `avoids this`_ by only storing the low bits of ``x18`` in the +``jmp_buf``, which requires the address of the shadow call stack to be +aligned to its size. + +.. _`avoids this`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/arch-arm64/bionic/setjmp.S#49 + +The architecture's call and return instructions (``bl`` and ``ret``) operate on +a register rather than the stack, which means that leaf functions are generally +protected from return address overwrites even without ShadowCallStack. Usage ===== @@ -132,17 +188,7 @@ return bar() + 1; } -Generates the following x86_64 assembly when compiled with ``-O2``: - -.. code-block:: gas - - push %rax - callq bar - add $0x1,%eax - pop %rcx - retq - -or the following aarch64 assembly: +Generates the following aarch64 assembly when compiled with ``-O2``: .. code-block:: none @@ -153,33 +199,7 @@ ldp x29, x30, [sp], #16 ret - -Adding ``-fsanitize=shadow-call-stack`` would output the following x86_64 -assembly: - -.. code-block:: gas - - mov (%rsp),%r10 - xor %r11,%r11 - addq $0x8,%gs:(%r11) - mov %gs:(%r11),%r11 - mov %r10,%gs:(%r11) - push %rax - callq bar - add $0x1,%eax - pop %rcx - xor %r11,%r11 - mov %gs:(%r11),%r10 - mov %gs:(%r10),%r10 - subq $0x8,%gs:(%r11) - cmp %r10,(%rsp) - jne trap - retq - - trap: - ud2 - -or the following aarch64 assembly: +Adding ``-fsanitize=shadow-call-stack`` would output the following assembly: .. code-block:: none