This is an archive of the discontinued LLVM Phabricator instance.

[scudo][tests] Store the allocator instance in a global rather than posix_memalign it
ClosedPublic

Authored by leonardchan on Aug 24 2023, 12:41 PM.

Details

Summary

The combined scudo allocator object is over 4MB in size which gets created via the posix_memalign on every test run. If the tests are sanitized with asan, then the asan allocator will need to mmap this large object every single time a test is run. Depending on where this is mapped, we might not be able to find a large enough contiguous space for scudo's primary allocator to reserve an arena. Such a case is more likely to occur on 39-bit vma for RISCV where the arena size is roughly a quarter of the whole address space and fragmentation can be a big issue.

This helps reduce fragmentation by instead placing the allocator instance in a global storage rather than doing an anonymous mmap.

Diff Detail

Event Timeline

leonardchan created this revision.Aug 24 2023, 12:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2023, 12:41 PM
leonardchan requested review of this revision.Aug 24 2023, 12:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2023, 12:41 PM
Herald added subscribers: Restricted Project, wangpc. · View Herald Transcript

This is reducing the fragmentation implicitly by having the large 4MiB fragment glued onto the executable's load image and thus avoiding that one fragment being two fragments. It's not really ideal to have this in bss when it's only actually used by some tests. I think this is an acceptable workaround. But if it's feasible to arrange things such that the test just will just allocate the large arena chunk first before the smaller 4MiB allocation, the space must be available somewhere (otherwise it wouldn't work to occupy that same amount of space by increasing the contiguous executable load image size).

compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
191

This needs comments about why it's preallocated in a global.

195

This should pair with the operate delete to assert that only one instance is live at a time, rather than giving out the same storage for each operator new when it might still be in use from the last one.

This needs to assert that the size is the expected one (or <= it).

Sorry I'm not familiar with the 39 bit vma. Could you share more context behind this? Is this only causing issue on Fuchsia RISCV?

compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
178

DefaultConfig

182

DefaultConfig

leonardchan marked 4 inline comments as done.Aug 24 2023, 2:03 PM

Sorry I'm not familiar with the 39 bit vma. Could you share more context behind this? Is this only causing issue on Fuchsia RISCV?

RISCV has a mode that supports a 39-bit virtual address space which is much smaller than other platforms which normally support up to 48. Most of the allocator configurations for riscv reserve a significant fraction of this address space (roughly 1/5th or 1/4th of this) for the primary allocator. Normally this would still be fine, but if the test is sanitized with asan, then we essentially have two allocators reserving a lot of space: one by the arena from the primary allocator in the scudo allocator instance, and asan's allocator. There is technically still technically enough space for both allocators to exist, although because they're so large and the VMA is small, it's very easy for other anonymous mmap'd allocations to be placed somewhere that would prevent the primary allocator from reserving enough space for its arena. One such allocation is the actual scudo allocator itself.

The scenario we've encountered this is on Fuchsia + RISCV Sv39 + asan instrumentation, but this isn't unique to Fuchsia and can also be found if running on Linux + RISCV Sv39 + asan.

This is reducing the fragmentation implicitly by having the large 4MiB fragment glued onto the executable's load image and thus avoiding that one fragment being two fragments. It's not really ideal to have this in bss when it's only actually used by some tests. I think this is an acceptable workaround. But if it's feasible to arrange things such that the test just will just allocate the large arena chunk first before the smaller 4MiB allocation, the space must be available somewhere (otherwise it wouldn't work to occupy that same amount of space by increasing the contiguous executable load image size).

I believe this is something I've discussed with @Caslyn and @fabio-d offline before: reserving the arena first before any allocations for tests are run, then passing this reserved area to scudo rather than scudo doing the anonymous mapping itself. I'm fine with this approach also if other folks would prefer that.

Sorry I'm not familiar with the 39 bit vma. Could you share more context behind this? Is this only causing issue on Fuchsia RISCV?

RISCV has a mode that supports a 39-bit virtual address space which is much smaller than other platforms which normally support up to 48. Most of the allocator configurations for riscv reserve a significant fraction of this address space (roughly 1/5th or 1/4th of this) for the primary allocator. Normally this would still be fine, but if the test is sanitized with asan, then we essentially have two allocators reserving a lot of space: one by the arena from the primary allocator in the scudo allocator instance, and asan's allocator. There is technically still technically enough space for both allocators to exist, although because they're so large and the VMA is small, it's very easy for other anonymous mmap'd allocations to be placed somewhere that would prevent the primary allocator from reserving enough space for its arena. One such allocation is the actual scudo allocator itself.

The scenario we've encountered this is on Fuchsia + RISCV Sv39 + asan instrumentation, but this isn't unique to Fuchsia and can also be found if running on Linux + RISCV Sv39 + asan.

Thanks for sharing the details! In this case, I may consider having a mode which doesn't need to reserve the space for all regions. That is, instead of reserving N_REGIONS * REGION_SIZE at the beginning, it'll be requesting REGION_SIZE when a region is used. This will mitigate the issue here. I'll evaluate the feasibility of supporting that mode and I'm fine with this workaround.

compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
193

I would suggest adding more comment about the RISCV 39bit vma mode and mention that this is a workaround.

194

By thinking of this a little bit, can we reorganize the code like

#if SCUDO_RISCV64
// Comment about why we need this
struct TestAllocatorStorage {
  constexpr size_t kMaxSize = std::max(/* all configs used here */);
  constexpr size_t kMaxAlign = std::max(/* all configs used here */);
  // To alleviate some problem, let's skip the thread safety analysis here.
  static void *get() NO_THREAD_SAFETY_ANALYSIS {
    M.lock();
    return AllocatorStorage;
  }
  
  static void release(void *ptr) NO_THREAD_SAFETY_ANALYSIS {
    M.assertHeld();
    M.unlock();
    // maybe some check like ptr == AllocatorStorage
  }

  static scudo::HybridMutex M;
  static uint8_t AllocatorStorage[kMaxSize];
};
scudo::HybridMutex TestAllocatorStorage::M;
alignas(kMaxAlign) uint8_t TestAllocatorStorage::AllocatorStorage[kMaxSize];
#else
struct TestAllocatorStorage {
  // default posix_memalign solution
};
#endif

void *TestAllocator<Config>::operator new(size_t size) {
   return TestAllocatorStorage::get();
}

void TestAllocator<Config>::operator delete(void *ptr) {
   TestAllocatorStorage::release();
}

And move this right above TestAllocator.

fabio-d added inline comments.Aug 25 2023, 4:43 AM
compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
194

+1 for this approach

Maybe the two "M" and "AllocatorStorage" fields can be made non-static, and instead the whole TestAllocatorStorage can be allocated statically? (this would avoid the need for the alignas not next to its corresponding declaration)

Btw, do we really need the mutex given that tests run in a single thread? I think a simple assertion like @leonardchan did would work too, no?

add @cferris for vis

compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
194

I prefer not to have a single thread assumption between different tests. I think it'd be better to make it sequential instead of crashing on multithreads cases (and we will end up having it be sequential when it happens)

This is just a workaround, I tend to remove it later anyway.

leonardchan marked 4 inline comments as done.
Chia-hungDuan added inline comments.Aug 25 2023, 3:08 PM
compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
172–177

Sorry I didn't say it clear. We don't need to put the kMaxSize/kMaxAlign in the conditional compilation. Just have one under SCUDO_RISCV64 is enough,

while is like,

#if SCUDO_RISCV64
constexpr size_t kMaxSize =
    std::max({sizeof(TestAllocator<scudo::DefaultConfig>),
              sizeof(TestAllocator<scudo::FuchsiaConfig>),
              sizeof(TestAllocator<scudo::AndroidSvelteConfig>),
              sizeof(TestAllocator<scudo::AndroidConfig>)});
...
229–239

By moving the kMaxAlign/kMaxSize, we can move TestAllocatorStorage up to TestAllocator and inline these two

leonardchan marked an inline comment as done.
leonardchan added inline comments.
compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
172–177

Done, although I don't think I can achieve exactly what you're mentioning:

  • If TestAllocatorStorage and kMaxSize and kMaxAlign should go before TestAllocator, then kMaxSize and kMaxAlign will only be able to use scudo::Allocator rather than TestAllocator. I think they should have the exact same size and alignment though.
  • If the default allocation should use posix_memalign, then kMaxAlign will need to be outside and before the SCUDO_RISCV64 since TestAllocatorStorage::operator new under the !SCUDO_RISCV64 block needs an alignment but it won't know the alignment to use if its before TestAllocator
  • TestAllocatorStorage::get needs to accept a size to pass to posix_memalign, unless we also want it to use kMaxSize. In that case, kMaxSize would need to go before the #if SCUDO_RISCV64
Chia-hungDuan added inline comments.Aug 29 2023, 3:19 PM
compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
74

This should be enclosed by

#if SCUDO_CAN_USE_PRIMARY64
              sizeof(scudo::Allocator<scudo::FuchsiaConfig>),
#endif

Ideally we want this to be used without conditional compilation. I'll be working on that later.

Also suggested a code move down below so that we won't have unused variable warning on non-RISCV64 path

79

Same as above

172–177

Sorry, that's my bad. I thought it's scudo::Allocator instead of TestAllocator. What you did in the last patch set is correct.

To avoid ambiguity, here should be it looked like,

template <typename Config> struct TestAllocator : scudo::Allocator<Config> {                                          
   // ...                                                
};                                                                                  
                                                                                    
constexpr size_t kMaxAlign = std::max({                                             
  alignof(TestAllocator<scudo::DefaultConfig>),                                     
#if SCUDO_CAN_USE_PRIMARY64                                                         
      alignof(TestAllocator<scudo::FuchsiaConfig>),                                 
#endif                                                                              
      alignof(TestAllocator<scudo::AndroidSvelteConfig>),                           
      alignof(TestAllocator<scudo::AndroidConfig>)                                  
});         

#if SCUDO_RISCV64                                                                
// The allocator is over 4MB large. Rather than creating an instance of this on  
// the heap, keep it in a global storage to reduce fragmentation from having to  
// mmap this at the start of every test.                                         
struct TestAllocatorStorage {                                                    
  constexpr size_t kMaxSize = std::max({                                         
    sizeof(TestAllocator<scudo::DefaultConfig>),                                 
#if SCUDO_CAN_USE_PRIMARY64                                                      
        sizeof(TestAllocator<scudo::FuchsiaConfig>),                             
#endif                                                                           
        sizeof(TestAllocator<scudo::AndroidSvelteConfig>),                       
        sizeof(TestAllocator<scudo::AndroidConfig>)                              
  });                                                                            
  // To alleviate some problem, let's skip the thread safety analysis here.      
  static void *get(size_t size) NO_THREAD_SAFETY_ANALYSIS {                      
    assert(size <= kMaxSize &&                                                   
           "Allocation size doesn't fit in the allocator storage");              
    M.lock();                                                                    
    return AllocatorStorage;                                                     
  }                                                                              
                                                                                 
  static void release(void *ptr) NO_THREAD_SAFETY_ANALYSIS {                     
    M.assertHeld();                                                              
    M.unlock();                                                                  
    ASSERT_EQ(ptr, AllocatorStorage);                                            
  }                                                                              
                                                                                 
  static scudo::HybridMutex M;                                                   
  static uint8_t AllocatorStorage[kMaxSize];                                     
};                                                                               
scudo::HybridMutex TestAllocatorStorage::M;                                      
alignas(kMaxAlign) uint8_t TestAllocatorStorage::AllocatorStorage[kMaxSize];     
#else                                                                            
struct TestAllocatorStorage {                                                    
  static void *get(size_t size) NO_THREAD_SAFETY_ANALYSIS {                      
    void *p = nullptr;                                                           
    EXPECT_EQ(0, posix_memalign(&p, kMaxAlign, size));                           
    return p;                                                                    
  }                                                                              
  static void release(void *ptr) NO_THREAD_SAFETY_ANALYSIS { free(ptr); }        
};                                                                               
#endif                        

template <typename Config>                                                       
void *TestAllocator<Config>::operator new(size_t size) {                         
  return TestAllocatorStorage::get(size);                                        
}                                                                                
                                                                                 
template <typename Config>                                                       
void TestAllocator<Config>::operator delete(void *ptr) {                         
  TestAllocatorStorage::release(ptr);                                            
}
leonardchan marked 4 inline comments as done.
This revision is now accepted and ready to land.Aug 31 2023, 3:08 PM