This is an archive of the discontinued LLVM Phabricator instance.

[Scudo] Use GWP-ASan's aligned allocations and fixup postalloc hooks.
ClosedPublic

Authored by hctim on Feb 2 2021, 10:30 AM.

Details

Summary

This patch does a few cleanup things:

  1. The non-standalone scudo has a problem where GWP-ASan allocations may not meet alignment requirements where Scudo was requested to have alignment >= 16. Use the new GWP-ASan API to fix this.
  2. The standalone variant loses some debugging information inside of GWP-ASan because we ask GWP-ASan to allocate an aligned size in the frontend. This means reports end up with 'UaF on a 16-byte allocation' for a 1-byte allocation with 16-byte alignment. Also use the new API to fix this.
  3. Add post-alloc hooks for GWP-ASan intercepted allocations, and add stats tracking for GWP-ASan allocations.
  4. Add a small test that checks the alignment of the frontend allocator, so that it can be used under GWP-ASan torture mode.
  5. Add GWP-ASan torture mode as a testing configuration to catch these regressions.

Depends on D94830, D95889.

Diff Detail

Event Timeline

hctim created this revision.Feb 2 2021, 10:30 AM
hctim requested review of this revision.Feb 2 2021, 10:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2021, 10:30 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
hctim updated this revision to Diff 320905.Feb 2 2021, 2:22 PM

Rebased on top of unwinder locking in D95889.

hctim updated this revision to Diff 320908.Feb 2 2021, 2:25 PM

Changed a comment and reformatted slightly.

hctim updated this revision to Diff 320947.Feb 2 2021, 4:35 PM

Rebase on top of D95542.

I think the skip in the tests will not work on Fuchsia. I'll check.

hctim added a comment.Feb 5 2021, 9:13 AM

I think the skip in the tests will not work on Fuchsia. I'll check.

I'd actually prefer to use --gtest_filter=-ScudoWrappersCppTest\.New, but I can't figure out the lit incantation to change the runflags provided to gtest binaries :(.

I think the skip in the tests will not work on Fuchsia. I'll check.

Currently wrappers_cpp.cpp & test aren't used on Fuchsia.

cryptoad accepted this revision.Feb 8 2021, 9:07 AM
This revision is now accepted and ready to land.Feb 8 2021, 9:07 AM
hctim edited reviewers, added: pcc; removed: eugenis.Feb 8 2021, 11:24 AM
hctim added a subscriber: eugenis.
hctim added inline comments.
compiler-rt/test/scudo/standalone/unit/gwp_asan/lit.site.cfg.py.in
25

@pcc - do you have any clues whether it's possible to configure lit to invoke the gtest binary with --gtest_filter=-ScudoWrappersCppTest\.New instead of this hack?

pcc added inline comments.Feb 8 2021, 11:36 AM
compiler-rt/lib/scudo/standalone/tests/wrappers_cpp_test.cpp
69

Shouldn't it be a compile-time thing here? i.e. #ifdef GWP_ASAN_HOOKS? Otherwise it will fail 1 in N times, right?

hctim marked an inline comment as done.Feb 8 2021, 12:37 PM
hctim added inline comments.
compiler-rt/lib/scudo/standalone/tests/wrappers_cpp_test.cpp
69

We run the scudo_standalone tests twice now, once with SCUDO_OPTIONS=GWP_ASAN_Enabled=0 (i.e. no GWP-ASan), and once with SCUDO_OPTIONS=GWP_ASAN_SampleRate=1:GWP_ASAN_MaxSimultaneousAllocations=100000 and SKIP_TYPE_MISMATCH=1 (i.e. GWP-ASan torture mode).

Lit config for the former is unchanged in test/scudo/standalone/unit/lit.site.cfg.py.in, this patch adds an extra .../unit/gwp_asan/lit.site.cfg.py.in.

hctim marked an inline comment as done.Feb 23 2021, 1:33 PM

@pcc - friendly ping for a final pass when you have some spare time. Could you take a look at the comment in lit.site.cfg.py.in and see if you have any suggestions?

This revision was landed with ongoing or failed builds.Apr 23 2021, 10:08 AM
This revision was automatically updated to reflect the committed changes.
hctim added a comment.Apr 23 2021, 3:37 PM

Broke the gcc bots. Crashes in gtest shutdown. Looks like a GWP-ASan allocation is getting freed by scudo.

* thread #1, name = 'ScudoCxxUnitTes', stop reason = signal SIGSEGV: address access protected (fault address: 0x7fd2f72b0ff0)
    frame #0: 0x0000000000468534 ScudoCxxUnitTest-x86_64-Test`scudo::Allocator<scudo::DefaultConfig, &(malloc_postinit)>::deallocate(void*, scudo::Chunk::Origin, unsigned long, unsigned long) [inlined] scudo::atomic_u64::Type scudo::atomic_load<scudo::atomic_u64>(MO=memory_order_relaxed, A=0x00007fd2f72b0ff0) at atomic_helpers.h:66:16
   63   inline typename T::Type atomic_load(const volatile T *A, memory_order MO) {
   64     DCHECK(!(reinterpret_cast<uptr>(A) % sizeof(*A)));
   65     typename T::Type V;
-> 66     __atomic_load(&A->ValDoNotUse, &V, MO);
   67     return V;
   68   }
   69
(lldb) bt
* thread #1, name = 'ScudoCxxUnitTes', stop reason = signal SIGSEGV: address access protected (fault address: 0x7fd2f72b0ff0)
  * frame #0: 0x0000000000468534 ScudoCxxUnitTest-x86_64-Test`scudo::Allocator<scudo::DefaultConfig, &(malloc_postinit)>::deallocate(void*, scudo::Chunk::Origin, unsigned long, unsigned long) [inlined] scudo::atomic_u64::Type scudo::atomic_load<scudo::atomic_u64>(MO=memory_order_relaxed, A=0x00007fd2f72b0ff0) at atomic_helpers.h:66:16
    frame #1: 0x0000000000468534 ScudoCxxUnitTest-x86_64-Test`scudo::Allocator<scudo::DefaultConfig, &(malloc_postinit)>::deallocate(void*, scudo::Chunk::Origin, unsigned long, unsigned long) [inlined] scudo::atomic_u64::Type scudo::atomic_load_relaxed<scudo::atomic_u64>(A=0x00007fd2f72b0ff0) at atomic_helpers.h:127
    frame #2: 0x0000000000468534 ScudoCxxUnitTest-x86_64-Test`scudo::Allocator<scudo::DefaultConfig, &(malloc_postinit)>::deallocate(void*, scudo::Chunk::Origin, unsigned long, unsigned long) at chunk.h:123
    frame #3: 0x0000000000468534 ScudoCxxUnitTest-x86_64-Test`scudo::Allocator<scudo::DefaultConfig, &(malloc_postinit)>::deallocate(this=0x0000000000499bc0, Ptr=0x00007fd2f72b1000, Origin=<unavailable>, DeleteSize=0, Alignment=<unavailable>) at combined.h:531
    frame #4: 0x000000000041b2c2 ScudoCxxUnitTest-x86_64-Test`testing::internal::TestFactoryImpl<ScudoWrappersCppTest_New_Test>::~TestFactoryImpl(this=0x00007fd2f72b1000) at gtest-internal.h:482:7
    frame #5: 0x000000000042d393 ScudoCxxUnitTest-x86_64-Test`testing::TestInfo::~TestInfo(this=0x00007fd2f72b3ef0) at gtest.cc:2521:25
    frame #6: 0x000000000042e017 ScudoCxxUnitTest-x86_64-Test`void testing::internal::Delete<testing::TestInfo>(x=0x00007fd2f72b3ef0) at gtest-internal-inl.h:341:3
    frame #7: 0x000000000045b4cb ScudoCxxUnitTest-x86_64-Test`void (__first=__normal_iterator<testing::TestInfo *const *, std::vector<testing::TestInfo *, std::allocator<testing::TestInfo *> > > @ 0x00007fffffffc838, __last=__normal_iterator<testing::TestInfo *const *, std::vector<testing::TestInfo *, std::allocator<testing::TestInfo *> > > @ 0x00007fffffffc830, __f=(ScudoCxxUnitTest-x86_64-Test`void testing::internal::Delete<testing::TestInfo>(testing::TestInfo*) at gtest-internal-inl.h:340))(testing::TestInfo*)>(__gnu_cxx::__normal_iterator<testing::TestInfo* const*, std::vector<testing::TestInfo*, std::allocator<testing::TestInfo*> > >, __gnu_cxx::__normal_iterator<testing::TestInfo* const*, std::vector<testing::TestInfo*, std::allocator<testing::TestInfo*> > >, void (*)(testing::TestInfo*)))(testing::TestInfo*) at stl_algo.h:3839:2
    frame #8: 0x000000000044616b ScudoCxxUnitTest-x86_64-Test`void testing::internal::ForEach<std::vector<testing::TestInfo*, std::allocator<testing::TestInfo*> >, void (*)(testing::TestInfo*)>(c=size=3, functor=(ScudoCxxUnitTest-x86_64-Test`void testing::internal::Delete<testing::TestInfo>(testing::TestInfo*) at gtest-internal-inl.h:340))(testing::TestInfo*)) at gtest-internal-inl.h:297:3
    frame #9: 0x000000000042df39 ScudoCxxUnitTest-x86_64-Test`testing::TestCase::~TestCase(this=0x00007fd2f72db000) at gtest.cc:2734:3
    frame #10: 0x000000000042e049 ScudoCxxUnitTest-x86_64-Test`testing::TestCase::~TestCase(this=0x00007fd2f72db000) at gtest.cc:2732:23
    frame #11: 0x0000000000436018 ScudoCxxUnitTest-x86_64-Test`void testing::internal::Delete<testing::TestCase>(x=0x00007fd2f72db000) at gtest-internal-inl.h:341:3
    frame #12: 0x000000000045390b ScudoCxxUnitTest-x86_64-Test`void (__first=__normal_iterator<testing::TestCase *const *, std::vector<testing::TestCase *, std::allocator<testing::TestCase *> > > @ 0x00007fffffffc908, __last=__normal_iterator<testing::TestCase *const *, std::vector<testing::TestCase *, std::allocator<testing::TestCase *> > > @ 0x00007fffffffc900, __f=(ScudoCxxUnitTest-x86_64-Test`void testing::internal::Delete<testing::TestCase>(testing::TestCase*) at gtest-internal-inl.h:340))(testing::TestCase*)>(__gnu_cxx::__normal_iterator<testing::TestCase* const*, std::vector<testing::TestCase*, std::allocator<testing::TestCase*> > >, __gnu_cxx::__normal_iterator<testing::TestCase* const*, std::vector<testing::TestCase*, std::allocator<testing::TestCase*> > >, void (*)(testing::TestCase*)))(testing::TestCase*) at stl_algo.h:3839:2
    frame #13: 0x000000000044833b ScudoCxxUnitTest-x86_64-Test`void testing::internal::ForEach<std::vector<testing::TestCase*, std::allocator<testing::TestCase*> >, void (*)(testing::TestCase*)>(c=size=1, functor=(ScudoCxxUnitTest-x86_64-Test`void testing::internal::Delete<testing::TestCase>(testing::TestCase*) at gtest-internal-inl.h:340))(testing::TestCase*)) at gtest-internal-inl.h:297:3
    frame #14: 0x0000000000435ddc ScudoCxxUnitTest-x86_64-Test`testing::internal::UnitTestImpl::~UnitTestImpl(this=0x00007fd2f72bde00) at gtest.cc:4359:3
    frame #15: 0x0000000000436069 ScudoCxxUnitTest-x86_64-Test`testing::internal::UnitTestImpl::~UnitTestImpl(this=0x00007fd2f72bde00) at gtest.cc:4357:31
    frame #16: 0x0000000000435951 ScudoCxxUnitTest-x86_64-Test`testing::UnitTest::~UnitTest(this=0x0000000000499ab8) at gtest.cc:4305:3
    frame #17: 0x00007ffff7c064d7 libc.so.6`__run_exit_handlers(status=0, listp=0x00007ffff7d86718, run_list_atexit=true, run_dtors=true) at exit.c:108:8
    frame #18: 0x00007ffff7c0667a libc.so.6`__GI_exit(status=<unavailable>) at exit.c:139:3
    frame #19: 0x00007ffff7beed11 libc.so.6`__libc_start_main(main=(ScudoCxxUnitTest-x86_64-Test`main at scudo_unit_test_main.cpp:35), argc=3, argv=0x00007fffffffcb28, init=(ScudoCxxUnitTest-x86_64-Test`__libc_csu_init), fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffcb18) at libc-start.c:342:3
    frame #20: 0x0000000000408b0a ScudoCxxUnitTest-x86_64-Test`_start + 42
hctim added a comment.Apr 23 2021, 4:48 PM

Where GPAForSignalHandler gets zero-initialized:

Watchpoint 3 hit:
old value: 140543992905728
new value: 0
Process 1613407 stopped
* thread #1, name = 'ScudoCxxUnitTes', stop reason = watchpoint 3
    frame #0: 0x0000000000408abc ScudoCxxUnitTest-x86_64-Test`::_GLOBAL__sub_I_wrappers_c.cpp() [inlined] gwp_asan::AllocatorState::AllocatorState(this=<unavailable>) at common.h:85:8
   82   // This holds the state that's shared between the GWP-ASan allocator and the
   83   // crash handler. This, in conjunction with the Metadata array, forms the entire
   84   // set of information required for understanding a GWP-ASan crash.
-> 85   struct AllocatorState {
   86     // Returns whether the provided pointer is a current sampled allocation that
   87     // is owned by this pool.
   88     GWP_ASAN_ALWAYS_INLINE bool pointerIsMine(const void *Ptr) const {
(lldb) bt
* thread #1, name = 'ScudoCxxUnitTes', stop reason = watchpoint 3
  * frame #0: 0x0000000000408abc ScudoCxxUnitTest-x86_64-Test`::_GLOBAL__sub_I_wrappers_c.cpp() [inlined] gwp_asan::AllocatorState::AllocatorState(this=<unavailable>) at common.h:85:8
    frame #1: 0x0000000000408ab5 ScudoCxxUnitTest-x86_64-Test`::_GLOBAL__sub_I_wrappers_c.cpp() [inlined] gwp_asan::GuardedPoolAllocator::GuardedPoolAllocator(this=<unavailable>) at guarded_pool_allocator.h:42
    frame #2: 0x0000000000408a50 ScudoCxxUnitTest-x86_64-Test`::_GLOBAL__sub_I_wrappers_c.cpp() [inlined] scudo::Allocator<scudo::DefaultConfig, &(malloc_postinit)>::Allocator(this=<unavailable>) at combined.h:46
    frame #3: 0x0000000000408a50 ScudoCxxUnitTest-x86_64-Test`::_GLOBAL__sub_I_wrappers_c.cpp() [inlined] __static_initialization_and_destruction_0(__initialize_p=<unavailable>, __priority=<unavailable>) at wrappers_c.cpp:29
    frame #4: 0x0000000000408a50 ScudoCxxUnitTest-x86_64-Test`::_GLOBAL__sub_I_wrappers_c.cpp() at wrappers_c.cpp:36
    frame #5: 0x000000000046d915 ScudoCxxUnitTest-x86_64-Test`__libc_csu_init + 69
    frame #6: 0x00007ffff7beec9a libc.so.6`__libc_start_main(main=(ScudoCxxUnitTest-x86_64-Test`main at scudo_unit_test_main.cpp:35), argc=1, argv=0x00007fffffffcba8, init=(ScudoCxxUnitTest-x86_64-Test`__libc_csu_init), fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffcb98) at libc-start.c:264:6
    frame #7: 0x0000000000408b1a ScudoCxxUnitTest-x86_64-Test`_start + 42

The following is happening:

  1. The loader calls malloc() very early on (in this case, inside of libstdc++.so.6 ___lldb_unnamed_symbol394$$libstdc++.so.6 + 54)
  2. This initializes Scudo and GWP-ASan.
  3. The loader then calls _start, which calls __libc_csu_init, which calls the init array.
  4. The init array calls the scudo Allocator constructor, which zero-initializes the GWP-ASan GPA via. a call to its constructor.

Not clear to me what the solution is yet, nor why this is a glibc-only problem.

hctim added a comment.Apr 23 2021, 5:13 PM

Curious, I can't find dynamic initializers (or even a constructor) for gwp_asan::GuardedPoolAllocator and scudo::Allocator<> in clang's binary, but they exist in the gcc binary. Maybe gcc really wants to give them a dynamic initializer?

hctim reopened this revision.Apr 23 2021, 5:18 PM
This revision is now accepted and ready to land.Apr 23 2021, 5:18 PM
hctim updated this revision to Diff 341387.Apr 28 2021, 7:31 PM

Make GWP-ASan's allocator a static data member and make the state have a
constexpr constructor. This prevents dynamic initializers. This may need
some attention if we end up having multiple initialized Scudo instances
in a program, but I don't know of anywhere where this is the case right
now.

hctim edited reviewers, added: vitalybuka; removed: pcc.Apr 28 2021, 7:32 PM
vitalybuka added inline comments.Apr 29 2021, 1:59 AM
compiler-rt/lib/scudo/standalone/combined.h
988

As-is slight changes still can silently break static initialization.
Lets do D101514.

1388–1390

you can do the following to avoid this template noise:

struct BaseAllocator {
   static gwp_asan::GuardedPoolAllocator GuardedAlloc;
}

struct Allocator: public BaseAllocator

or even just declare outside of the class.

extern gwp_asan::GuardedPoolAllocator GuardedAlloc;

and move definition into cpp file

to my taste it's cleaner, but not critical

hctim updated this revision to Diff 341685.Apr 29 2021, 3:50 PM
hctim marked 2 inline comments as done.

Moved GWP-ASan allocator back to a member, now depends on D101514 where
Scudo enforces static initialization.

compiler-rt/lib/scudo/standalone/combined.h
988

Moved back to member.

1388–1390

Removed - no longer relevant.

hctim updated this revision to Diff 342527.May 3 2021, 1:23 PM

Update to provide correct statistics for live/free allocations in GWP-ASan. Use the slot size in the calculation rather than actual size, as there's no way to pre-calculate the size of the allocations that can land in the guarded pool.

Also, rebase on main.

hctim added a comment.May 3 2021, 1:28 PM

Update to provide correct statistics for live/free allocations in GWP-ASan. Use the slot size in the calculation rather than actual size, as there's no way to pre-calculate the size of the allocations that can land in the guarded pool.

@cryptoad - Just wanted to double check with you that this schema SGTY. mallinfo.uordblks and mallinfo.fordblks for malloc(1) moves up and down by a page if it lands in the GWP-ASan region, which is a little strange, but it's better than the alternative of having 4095 bytes "available" as noted by fordblks that aren't actually allocable.

Update to provide correct statistics for live/free allocations in GWP-ASan. Use the slot size in the calculation rather than actual size, as there's no way to pre-calculate the size of the allocations that can land in the guarded pool.

@cryptoad - Just wanted to double check with you that this schema SGTY. mallinfo.uordblks and mallinfo.fordblks for malloc(1) moves up and down by a page if it lands in the GWP-ASan region, which is a little strange, but it's better than the alternative of having 4095 bytes "available" as noted by fordblks that aren't actually allocable.

The idea is fine. The thing that looks a bit weird is that we are calling add & sub on the GlobalStats which isn't locked (stats do an atomic load & store, not add & sub). In the other instances (Shared/Exclusive Caches or Secondary), the Stats are (by the locking property of the owner). So 2 threads allocating via GWP-ASan concurrently could end up with some discrepancy there?

hctim updated this revision to Diff 343120.May 5 2021, 11:32 AM

Add locks to the global stats, and call the locks around GWP-ASan allocations that aren't part of already mutually exclusive initialization.

Given that GWP-ASan allocations are extremely rare, the runtime performance of this is negligible. Unfortunately, the small code size increase does hurt performance a little, 2.3%-ish. This is assumedly due to decreased icache performance. This number is the same as adding a *single* nop instruction in the fast path, so I have a feeling that we're right on the edge of some alignment threshold right now and the extra branch pushes us beyond the limit.

hctim added a comment.May 5 2021, 2:58 PM

Unfortunately, the small code size increase does hurt performance a little, 2.3%-ish.

Renege - this is based on BM_malloc_free<scudo::AndroidConfig>/64_median run 50x. The longer tests like BM_malloc_free_loop<scudo::AndroidConfig>/64_mean, run 50x, show no performance degradation.

When fixing a the missing initialization below from D101865, the 2.3% performance degradation of BM_malloc_free<scudo::AndroidConfig>/64_median actually flipped to a performance improvement of 2.76%. Clearly that's the margin of error with cache alignment effects, as this change didn't actually touch any of the code under test.

With that in mind, I'm confident in saying that the addition to the slow path has no performance impact except for code alignment perturbation (which is not something that is reasonable to optimise for).

diff --git a/compiler-rt/lib/scudo/standalone/benchmarks/malloc_benchmark.cpp b/compiler-rt/lib/scudo/standalone/benchmarks/malloc_benchmark.cpp
index 0a0d795aa758..78789bd008b0 100644
--- a/compiler-rt/lib/scudo/standalone/benchmarks/malloc_benchmark.cpp
+++ b/compiler-rt/lib/scudo/standalone/benchmarks/malloc_benchmark.cpp
@@ -68,7 +68,7 @@ static void BM_malloc_free_loop(benchmark::State &State) {
   };
   std::unique_ptr<AllocatorT, decltype(Deleter)> Allocator(new AllocatorT,
                                                            Deleter);
+  CurrentAllocator = Allocator.get();
   Allocator->reset();

   const size_t NumIters = State.range(0);
cryptoad accepted this revision.May 10 2021, 12:02 PM
This revision was landed with ongoing or failed builds.May 10 2021, 12:58 PM
This revision was automatically updated to reflect the committed changes.