Changeset View
Standalone View
lib/safestack/safestack.cc
- This file was added.
//===-- safestack.cc ------------------------------------------------------===// | ||||||
// | ||||||
// The LLVM Compiler Infrastructure | ||||||
// | ||||||
// This file is distributed under the University of Illinois Open Source | ||||||
// License. See LICENSE.TXT for details. | ||||||
// | ||||||
//===----------------------------------------------------------------------===// | ||||||
// | ||||||
// This file implements the runtime support for the safe stack protection | ||||||
// mechanism. The runtime manages allocation/deallocation of the unsafe stack | ||||||
// for the main thread, as well as all pthreads that are created/destroyed | ||||||
// during program execution. | ||||||
// | ||||||
//===----------------------------------------------------------------------===// | ||||||
#include <limits.h> | ||||||
#include <pthread.h> | ||||||
#include <stddef.h> | ||||||
kcc: You may find lots of utilities in sanitizer_common useful here.
E.g. try CHECK, CHECK_EQ, etc… | ||||||
#include <sys/resource.h> | ||||||
#include <sys/user.h> | ||||||
Not Done ReplyInline ActionsPlease check if you still need so many headers. Remove anything you don't need. kcc: Please check if you still need so many headers. Remove anything you don't need. | ||||||
Not Done ReplyInline ActionsDone pcc: Done | ||||||
#include "interception/interception.h" | ||||||
#include "sanitizer_common/sanitizer_common.h" | ||||||
/// Minimum stack alignment for the unsafe stack. | ||||||
const unsigned kStackAlign = 16; | ||||||
/// Default size of the unsafe stack. This value is only used if the stack | ||||||
/// size rlimit is set to infinity. | ||||||
const unsigned kDefaultUnsafeStackSize = 0x2800000; | ||||||
// TODO: To make accessing the unsafe stack pointer faster, we plan to | ||||||
// eventually store it directly in the thread control block data structure on | ||||||
// platforms where this structure is pointed to by %fs or %gs. This is exactly | ||||||
Not Done ReplyInline ActionsYes. It must also be dynamic to respect pthread stack limits and system-wide rlimits. theraven: Yes. It must also be dynamic to respect pthread stack limits and system-wide rlimits. | ||||||
// the same mechanism as currently being used by the traditional stack | ||||||
// protector pass to store the stack guard (see getStackCookieLocation() | ||||||
// function above). Doing so requires changing the tcbhead_t struct in glibc | ||||||
// on Linux and tcb struct in libc on FreeBSD. | ||||||
Not Done ReplyInline Actionsplease no macros where constants work kcc: please no macros where constants work | ||||||
Not Done ReplyInline ActionsDone pcc: Done | ||||||
// | ||||||
// For now, store it in a thread-local variable. | ||||||
extern "C" { | ||||||
__attribute__((visibility( | ||||||
Not Done ReplyInline ActionsWhy not simply force -lpthread in the driver? kcc: Why not simply force -lpthread in the driver? | ||||||
Not Done ReplyInline ActionsThe overhead of adding -pthread (-lpthread is non-portable) to single-threaded programs is measurably greater than the overhead of safe stack. We (current hat: FreeBSD) are willing to ship packages with SSP by default and would be very willing to ship packages with safe-stack, but if it forced -pthread then the overhead would be more than we'd be willing to accept. theraven: The overhead of adding -pthread (-lpthread is non-portable) to single-threaded programs is… | ||||||
"default"))) __thread void *__safestack_unsafe_stack_ptr = nullptr; | ||||||
} | ||||||
Not Done ReplyInline Actionsnullptr here and other pointers below. jfb: `nullptr` here and other pointers below. | ||||||
Not Done ReplyInline ActionsDone pcc: Done | ||||||
Not Done ReplyInline Actionsdo you really need this? kcc: do you really need this? | ||||||
Not Done ReplyInline ActionsRemoved; they are hints and are not required for correctness. We can bring them back if benchmarks show a clear benefit. pcc: Removed; they are hints and are not required for correctness. We can bring them back if… | ||||||
// Per-thread unsafe stack information. It's not frequently accessed, so there | ||||||
// it can be kept out of the tcb in normal thread-local variables. | ||||||
static __thread void *unsafe_stack_start = nullptr; | ||||||
static __thread size_t unsafe_stack_size = 0; | ||||||
static __thread size_t unsafe_stack_guard = 0; | ||||||
Not Done ReplyInline ActionsI might be missing something, but can't we just use TLS to store the second stack? kcc: I might be missing something, but can't we just use TLS to store the second stack? | ||||||
Not Done ReplyInline ActionsC11 TLS does not have a way of attaching destructors. C++11 does (although the standard is undefined in the presence of shared library loading and unloading, which makes it less useful), but does not have any way of ordering the destruction. theraven: C11 TLS does not have a way of attaching destructors. C++11 does (although the standard is… | ||||||
Not Done ReplyInline ActionsThe main reason of putting the unsafe stack pointer in the tcb directly is performance: that way, the unsafe stack pointer can be loaded/stored with just one memory access. It's exactly the same optimization as the existing stack protector uses (see X86TargetLowering::getStackCookieLocation in X86ISelLowering.cpp:1925). Following David's suggestion, I will remove this optimization on Linux for now, until it can be properly implemented on the glibc side as well. I plan to only keep it on Darwin (where it seems to be cleaner, as I will explain in the comments). I will use TLS for unsafe_stack_(start|size|guard) on all platforms, as those are not performance critical in any way. ksvladimir: The main reason of putting the unsafe stack pointer in the tcb directly is performance: that… | ||||||
static inline void *unsafe_stack_alloc(size_t size, size_t guard) { | ||||||
CHECK_GE(size + guard, size); | ||||||
void *addr = MmapOrDie(size + guard, "unsafe_stack_alloc"); | ||||||
Not Done ReplyInline ActionsIt's probably not a huge issue for the unsafe stack, but I'd rather check that size + guard doesn't overflow uptr before calling MmapOrDie. The same applies in other parts of the code below. jfb: It's probably not a huge issue for the unsafe stack, but I'd rather check that `size + guard`… | ||||||
Not Done ReplyInline ActionsDone pcc: Done | ||||||
MprotectNoAccess((uptr)addr, (uptr)guard); | ||||||
return (char *)addr + guard; | ||||||
Not Done ReplyInline Actionscan this code use more of sanitizer_common? kcc: can this code use more of sanitizer_common?
(here and in the rest of this file) | ||||||
Not Done ReplyInline ActionsYep, this now uses internal_mmap/internal_munmap and internal_mprotect below. pcc: Yep, this now uses internal_mmap/internal_munmap and internal_mprotect below. | ||||||
} | ||||||
Not Done ReplyInline ActionsMmm. I don't like that. kcc: Mmm. I don't like that.
Why can't we just ensure that pthread is linked? | ||||||
Not Done ReplyInline ActionsMy understanding is that on some platforms (FreeBSD?) linking the pthread library has a performance impact. We could probably consider moving pthread stuff to a separate library which is only linked when pthread is linked. I guess for now we can use the pthread symbols unconditionally. Then any later change will mostly be a matter of moving code around. pcc: My understanding is that on some platforms (FreeBSD?) linking the pthread library has a… | ||||||
static inline void unsafe_stack_setup(void *start, size_t size, size_t guard) { | ||||||
void *stack_ptr = (char *)start + size; | ||||||
CHECK_EQ((((size_t)stack_ptr) & (kStackAlign - 1)), 0); | ||||||
Not Done ReplyInline ActionsOverflow check here too, and with guard. jfb: Overflow check here too, and with guard. | ||||||
__safestack_unsafe_stack_ptr = stack_ptr; | ||||||
unsafe_stack_start = start; | ||||||
Not Done ReplyInline ActionsMmapOrDie dies if it fails. You should not need the check kcc: MmapOrDie dies if it fails. You should not need the check | ||||||
Not Done ReplyInline ActionsRemoved pcc: Removed | ||||||
unsafe_stack_size = size; | ||||||
unsafe_stack_guard = guard; | ||||||
} | ||||||
static void unsafe_stack_free() { | ||||||
if (unsafe_stack_start) { | ||||||
UnmapOrDie((char *)unsafe_stack_start - unsafe_stack_guard, | ||||||
Not Done ReplyInline ActionsDrop the (void) case. jfb: Drop the `(void)` case. | ||||||
Not Done ReplyInline ActionsDone pcc: Done | ||||||
unsafe_stack_size + unsafe_stack_guard); | ||||||
} | ||||||
unsafe_stack_start = nullptr; | ||||||
} | ||||||
Not Done ReplyInline ActionsThere are magic numbers here again. Please reference either an ABI document or equivalent that indicates that these are safe locations to use. theraven: There are magic numbers here again. Please reference either an ABI document or equivalent that… | ||||||
/// Thread data for the cleanup handler | ||||||
static pthread_key_t thread_cleanup_key; | ||||||
/// Safe stack per-thread information passed to the thread_start function | ||||||
Not Done ReplyInline Actionsno macros, please, unless you have too, in which case describe in comments why. kcc: no macros, please, unless you have too, in which case describe in comments why. | ||||||
Not Done ReplyInline ActionsRemoved pcc: Removed | ||||||
struct tinfo { | ||||||
void *(*start_routine)(void *); | ||||||
void *start_routine_arg; | ||||||
Not Done ReplyInline Actionsdo you still need this comment? kcc: do you still need this comment? | ||||||
Not Done ReplyInline ActionsRemoved pcc: Removed | ||||||
void *unsafe_stack_start; | ||||||
size_t unsafe_stack_size; | ||||||
size_t unsafe_stack_guard; | ||||||
}; | ||||||
/// Wrap the thread function in order to deallocate the unsafe stack when the | ||||||
/// thread terminates by returning from its main function. | ||||||
Not Done ReplyInline ActionsWhy? kcc: Why?
The recent OSX should support TLS well (or not?) | ||||||
Not Done ReplyInline ActionsThis now uses a __thread variable on all platforms. According to Laszlo this was done solely for efficiency reasons, so we should be fine using __thread at least to begin with. I imagine that if the overhead matters in practice we can try to persuade the relevant people at Apple to allocate a TLS slot for us. pcc: This now uses a `__thread` variable on all platforms.
According to Laszlo this was done solely… | ||||||
static void *thread_start(void *arg) { | ||||||
struct tinfo *tinfo = (struct tinfo *)arg; | ||||||
void *(*start_routine)(void *) = tinfo->start_routine; | ||||||
void *start_routine_arg = tinfo->start_routine_arg; | ||||||
// Setup the unsafe stack; this will destroy tinfo content | ||||||
unsafe_stack_setup(tinfo->unsafe_stack_start, tinfo->unsafe_stack_size, | ||||||
tinfo->unsafe_stack_guard); | ||||||
Not Done ReplyInline ActionsI would use MmapOrDie (or create a similar one in sanitizer_common/sanitizer_posix.cc) kcc: I would use MmapOrDie (or create a similar one in sanitizer_common/sanitizer_posix.cc)
and get… | ||||||
Not Done ReplyInline ActionsDone pcc: Done | ||||||
// Make sure out thread-specific destructor will be called | ||||||
// FIXME: we can do this only any other specific key is set by | ||||||
// intercepting the pthread_setspecific function itself | ||||||
pthread_setspecific(thread_cleanup_key, (void *)1); | ||||||
Not Done ReplyInline Actionss/intersepting/intercepting/ jfb: s/intersepting/intercepting/ | ||||||
Not Done ReplyInline ActionsDone pcc: Done | ||||||
return start_routine(start_routine_arg); | ||||||
} | ||||||
Not Done ReplyInline Actionss/rutine/routine/ Though I'm not sure the comment is really helping me understand much here, I'd just drop it. jfb: s/rutine/routine/
Though I'm not sure the comment is really helping me understand much here… | ||||||
Not Done ReplyInline ActionsRemoved pcc: Removed | ||||||
/// Thread-specific data destructor | ||||||
static void thread_cleanup_handler(void *_iter) { | ||||||
// We want to free the unsafe stack only after all other destructors | ||||||
// have already run. We force this function to be called multiple times. | ||||||
// User destructors that might run more then PTHREAD_DESTRUCTOR_ITERATIONS-1 | ||||||
Not Done ReplyInline Actionsuse CHECK_EQ kcc: use CHECK_EQ | ||||||
Not Done ReplyInline ActionsDone pcc: Done | ||||||
// times might still end up executing after the unsafe stack is deallocated. | ||||||
size_t iter = (size_t)_iter; | ||||||
if (iter < PTHREAD_DESTRUCTOR_ITERATIONS) { | ||||||
pthread_setspecific(thread_cleanup_key, (void *)(iter + 1)); | ||||||
} else { | ||||||
// This is the last iteration | ||||||
unsafe_stack_free(); | ||||||
} | ||||||
} | ||||||
/// Intercept thread creation operation to allocate and setup the unsafe stack | ||||||
INTERCEPTOR(int, pthread_create, pthread_t *thread, | ||||||
Not Done ReplyInline ActionsUnmapOrDie kcc: UnmapOrDie | ||||||
Not Done ReplyInline ActionsDone pcc: Done | ||||||
const pthread_attr_t *attr, | ||||||
void *(*start_routine)(void*), void *arg) { | ||||||
size_t size = 0; | ||||||
size_t guard = 0; | ||||||
if (attr != NULL) { | ||||||
pthread_attr_getstacksize(attr, &size); | ||||||
pthread_attr_getguardsize(attr, &guard); | ||||||
} else { | ||||||
// get pthread default stack size | ||||||
pthread_attr_t tmpattr; | ||||||
pthread_attr_init(&tmpattr); | ||||||
pthread_attr_getstacksize(&tmpattr, &size); | ||||||
pthread_attr_getguardsize(&tmpattr, &guard); | ||||||
pthread_attr_destroy(&tmpattr); | ||||||
} | ||||||
CHECK_NE(size, 0); | ||||||
Not Done ReplyInline ActionsWhy is this Linux only? If the goal is to avoid LD_PRELOAD and similar interceptions, then it should be done everywhere (possibly with the exception of Darwin) theraven: Why is this Linux only? If the goal is to avoid LD_PRELOAD and similar interceptions, then it… | ||||||
Not Done ReplyInline ActionsDon't you want to use internal_mmap from sanitizer_common/sanitizer_libc.h? kcc: Don't you want to use internal_mmap from sanitizer_common/sanitizer_libc.h? | ||||||
CHECK_EQ((size & (kStackAlign - 1)), 0); | ||||||
CHECK_EQ((guard & (PAGE_SIZE - 1)), 0); | ||||||
void *addr = unsafe_stack_alloc(size, guard); | ||||||
struct tinfo *tinfo = | ||||||
(struct tinfo *)(((char *)addr) + size - sizeof(struct tinfo)); | ||||||
tinfo->start_routine = start_routine; | ||||||
Not Done ReplyInline ActionsOn FreeBSD, MAP_STACK should also be passed. Not sure about other BSDs. In general, however, these checks should be of the form:
#endif theraven: On FreeBSD, MAP_STACK should also be passed. Not sure about other BSDs. In general, however… | ||||||
tinfo->start_routine_arg = arg; | ||||||
tinfo->unsafe_stack_start = addr; | ||||||
tinfo->unsafe_stack_size = size; | ||||||
tinfo->unsafe_stack_guard = guard; | ||||||
return REAL(pthread_create)(thread, attr, thread_start, tinfo); | ||||||
} | ||||||
extern "C" __attribute__((visibility("default"))) | ||||||
#if !SANITIZER_CAN_USE_PREINIT_ARRAY | ||||||
// On ELF platforms, the constructor is invoked using .preinit_array (see below) | ||||||
__attribute__((constructor(0))) | ||||||
#endif | ||||||
void __safestack_init() { | ||||||
Not Done ReplyInline ActionsThis should also use SANITIZER_CAN_USE_PREINIT_ARRAY: attribute((constructor(0))) should be added whenever preinit_array is not used. ksvladimir: This should also use SANITIZER_CAN_USE_PREINIT_ARRAY: __attribute__((constructor(0))) should be… | ||||||
Not Done ReplyInline ActionsDone pcc: Done | ||||||
// Determine the stack size for the main thread. | ||||||
size_t size = kDefaultUnsafeStackSize; | ||||||
size_t guard = 4096; | ||||||
struct rlimit limit; | ||||||
if (getrlimit(RLIMIT_STACK, &limit) == 0 && limit.rlim_cur != RLIM_INFINITY) | ||||||
size = limit.rlim_cur; | ||||||
Not Done ReplyInline ActionsIs this initialization check useful if __attribute__((constructor(0))) was used? Can this be done concurrently, or is it just called multiple times from the same thread of execution? If concurrent then initialization is racy. jfb: Is this initialization check useful if `__attribute__((constructor(0)))` was used? Can this be… | ||||||
Not Done ReplyInline Actions
I don't think it is. Removed. (Regarding Vova's comment, we don't currently support linking SafeStack into DSOs.) pcc: > Is this initialization check useful if __attribute__((constructor(0))) was used?
I don't… | ||||||
Not Done ReplyInline ActionsThe primary reason for this check is to handle cases when libsafestack is linked into multiple DSOs and __safestack_init ends up being on multiple constructor lists. It can only be called concurrently if multiple DSOs are being dlopen'ed in parallel. I think that dlopen itself isn't thread-safe and shouldn't be used that way but, perhaps, for extra safety it's useful to make this code non-racy using atomics. ksvladimir: The primary reason for this check is to handle cases when libsafestack is linked into multiple… | ||||||
Not Done ReplyInline ActionsGotcha. Would it be useful to have nothing here for now (@pcc just deleted the code), but add a comment explaining @ksvladimir's point? jfb: Gotcha. Would it be useful to have nothing here for now (@pcc just deleted the code), but add a… | ||||||
Not Done ReplyInline ActionsI don't think we need to add an explanation here. It would only make sense to do so if we supported multiple copies of the library in multiple DSOs, which isn't necessarily something we'll even want to support. pcc: I don't think we need to add an explanation here. It would only make sense to do so if we… | ||||||
// Allocate unsafe stack for main thread | ||||||
void *addr = unsafe_stack_alloc(size, guard); | ||||||
unsafe_stack_setup(addr, size, guard); | ||||||
// Initialize pthread interceptors for thread allocation | ||||||
INTERCEPT_FUNCTION(pthread_create); | ||||||
// Setup the cleanup handler | ||||||
pthread_key_create(&thread_cleanup_key, thread_cleanup_handler); | ||||||
} | ||||||
#if SANITIZER_CAN_USE_PREINIT_ARRAY | ||||||
// On ELF platforms, run safestack initialization before any other constructors. | ||||||
// On other platforms we use the constructor attribute to arrange to run our | ||||||
// initialization early. | ||||||
extern "C" { | ||||||
__attribute__((section(".preinit_array"), | ||||||
used)) void (*__safestack_preinit)(void) = __safestack_init; | ||||||
} | ||||||
#endif | ||||||
extern "C" | ||||||
__attribute__((visibility("default"))) void *__get_unsafe_stack_start() { | ||||||
return unsafe_stack_start; | ||||||
} | ||||||
extern "C" | ||||||
__attribute__((visibility("default"))) void *__get_unsafe_stack_ptr() { | ||||||
return __safestack_unsafe_stack_ptr; | ||||||
Not Done ReplyInline Actionsuse CHECK kcc: use CHECK | ||||||
Not Done ReplyInline ActionsDone pcc: Done | ||||||
} | ||||||
Not Done ReplyInline ActionsThis seems somewhat fragile. It doesn't guarantee that the other destructors won't be run after this, only that they get several opportunities to run before. theraven: This seems somewhat fragile. It doesn't guarantee that the other destructors won't be run… | ||||||
Not Done ReplyInline ActionsWhy is this not on Linux? theraven: Why is this not on Linux? | ||||||
Not Done ReplyInline ActionsThis should check RLIMIT_STACK. theraven: This should check RLIMIT_STACK. | ||||||
Not Done ReplyInline ActionsWhy noinline? It is externally visible, so a non-inline version will be produced anyway. Why not let LTO builds inline it? Or is the noinline attribute because this function must be compiled *without* the safe stack support itself? If so, document it (or perhaps add the no_safestack attribute instead, to prevent it from being inlined in unsafe contexts). theraven: Why noinline? It is externally visible, so a non-inline version will be produced anyway. Why… | ||||||
Not Done ReplyInline ActionsI believe that this should work on most ELF platforms. theraven: I believe that this should work on most ELF platforms. | ||||||
Not Done ReplyInline Actionsplease use static for anything that is is not supposed to be called/used from another module. kcc: please use static for anything that is is not supposed to be called/used from another module. | ||||||
Not Done ReplyInline ActionsWhere possible, please reuse functions from sanitizer_common, e.g. here you may try GetThreadStackTopAndBottom kcc: Where possible, please reuse functions from sanitizer_common, e.g. here you may try… | ||||||
Not Done ReplyInline Actionswhy NDEBUG? kcc: why NDEBUG?
I would prefer tnot to have a separate build for compiler-rt
| ||||||
Not Done ReplyInline Actionscan you use SANITIZER_INTERFACE_ATTRIBUTE? kcc: can you use SANITIZER_INTERFACE_ATTRIBUTE? | ||||||
Not Done ReplyInline Actionsplease make this similar to other sanitizers (SANITIZER_CAN_USE_PREINIT_ARRAY). kcc: please make this similar to other sanitizers (SANITIZER_CAN_USE_PREINIT_ARRAY). | ||||||
Not Done ReplyInline ActionsDone pcc: Done | ||||||
Not Done ReplyInline ActionsPlease add a comment on why 2*sizeof(void*) is correct. jfb: Please add a comment on why `2*sizeof(void*)` is correct. | ||||||
Not Done ReplyInline ActionsRemoved pcc: Removed | ||||||
Not Done ReplyInline ActionsIn fact, it might depend on the architecture and even on the calling convention. Perhaps we should just remove this function altogether, as it is not really related to SafeStack, and we ended up not using it anyway neither in Chrome nor FreeBSD. ksvladimir: In fact, it might depend on the architecture and even on the calling convention. Perhaps we… |
You may find lots of utilities in sanitizer_common useful here.
E.g. try CHECK, CHECK_EQ, etc instead of assert.
In general, I'd like to see more code reuse between safe_stack and sanitizers.