This is an archive of the discontinued LLVM Phabricator instance.

[asan] Speed up __asan_unpoison_stack_memory on linux
Needs RevisionPublic

Authored by loladiro on Aug 6 2022, 4:09 AM.

Details

Summary

The __asan_unpoison_stack_memory implementation attempts to zero pages more
quickly by asking the kernel to mmap new zero pages over the region to be
zerod. To do this, it performs three system calls, the mmap and then two
madvise calls to restore the madvise flags for the region. For very large
regions, this is generally better than attempting to memset the entire thing,
however, we can do even better (at least on linux), by not touching the memory
map at all and just asking the kernel to replace the memory in question by zero
pages using MADV_DONTNEED. With the current implementation, the kernel needs to
do a fair bit of work to split the original shadow region on the first mmap and
then merge it back together again after the second madvise. By using
MADV_DONTNEED, we can skip this work, while obtaining the same result (replacing
the memory range in question by demand-paged zero pages). In extreme cases,
this can be significantly faster. Consider for example, the following benchmark:

extern void __asan_unpoison_stack_memory(void *, size_t);

int main(void) {
    char *stacks[NSTACKS];
    for (int i = 0; i < NSTACKS; ++i) {
        char *stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE,
                            MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        assert(stack != MAP_FAILED);
        stacks[i] = stack;
    }
    for (int i = 0; i < NROUNDS; i++) {
        // This is meant to simulate the repeated and rapid allocation of
        // fiber stacks. Obviously everything but the allocation is cut out,
        // but it's still a reasonable benchmark of the performance of
        // __asan_unpoison_stack_memory.
        __asan_unpoison_stack_memory(stacks[i % NSTACKS], STACK_SIZE);
        /* Stack would be used here by some task, the task finishes and the
           stack gets recycled for the next task. */
    }
    return 0;
}

On my machine, current master takes 1.4s on this benchmark, while this patch
is about 7x faster. Obviously, this is a best-case estimate. Replacing the
mmap by malloc gives timings that are closer together, but nevertheless with
the version in this patch being somewhat faster.

The other benefit, although my primary motivation for this is somewhat more
subtle: In the current implementation, as mentioned above, the kernel needs
to figure out that it can stitch the memory regions back together after
the second madvise. However, in an application that does lots of forking,
we have observed that the kernel is sometimes unable to stich the shadow
memory region back together, shattering it into many small pieces over the
lifetime of the process and eventually exceeding the system maximum map limit.
My understanding is that this is because the kernel will refuse to merge
memory regions that do not share the same anon_vma, which gets established
by fork and even if the fork later concludes, the kernel does not go back
to re-coalesce any mappings it might have missed. Because this patch does
not split the original shadow region, it can avoid this particular issue.

As an addendum to this, I think a similar fracturing would occur on systems
that have decorate_proc_maps enabled, as the kernel will refuse to merge
mappings with different names and the current implementation does not provide
a name for the replacement pages (but does name the shadow regions in general).
However, because ithis is not enabled by default on non-Android Linux,
I have not actually observed this.

Diff Detail

Event Timeline

loladiro created this revision.Aug 6 2022, 4:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2022, 4:09 AM
Herald added a subscriber: Enna1. · View Herald Transcript
loladiro requested review of this revision.Aug 6 2022, 4:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2022, 4:09 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
loladiro updated this revision to Diff 450509.Aug 6 2022, 4:15 AM

clang-format patch

vitalybuka added a subscriber: vitalybuka.
vitalybuka added inline comments.
compiler-rt/lib/asan/asan_poisoning.h
79

can you move ReserveShadowMemoryRange into fallback of ReleaseMemoryPagesToOSAndZeroFill
and ReleaseMemoryPagesToOSAndZeroFill into sanitizer_common?

compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
2376

Just call ReleaseMemoryPagesToOS and CHECK is redundant.

vitalybuka requested changes to this revision.Aug 12 2022, 10:54 AM
This revision now requires changes to proceed.Aug 12 2022, 10:54 AM