Add support for ThreadSafeAllocator, which is needed for a CAS
implementation, which requires thread safe allocation for data storage.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/include/llvm/Support/ThreadSafeAllocator.h | ||
---|---|---|
19 | "an unfair lock" should say "a spin lock". You can have blocking unfair locks, it just means they don't provide guarantees against starvation under contention. Similarly the paragraph below should not talk about fair/unfair since that's not the right distinction. | |
38 | I'm a bit skeptical of = 1. I think this comes from SpecificBumpPtrAllocator, where the memory is typed, but for e.g. ThreadSafeAllocator<BumpPtrAllocator> this will allocate a single byte. I would just drop the default value, but if you want to keep it maybe do some sfinae magic to only apply on SpecificBumpPtrAllocator? |
testing might be more robust if a mock allocator were used underneath this allocator? that mock allocator could block/wait to intentionally induce contention and demonstrate that the threads do block/wait appropriately?
llvm/include/llvm/Support/ThreadSafeAllocator.h | ||
---|---|---|
25 | ? |
llvm/unittests/Support/ThreadSafeAllocatorTest.cpp | ||
---|---|---|
28 | timers like this risk flaky failures if the system is under load such that the delay is inadequate to induce contention - they also make tests unnecessarily slow. Could you do this with locking/flags instead, ensuring that events occur as required? |
@dexonsmith Move the discussion here.
Here's a sketch that I think mostly works?
// Block for bump allocations. struct BumpBlock { std::atomic<char *> Ptr; std::atomic<BumpBlock *> Next; char Bytes[4084]; BumpBlock() : Ptr{Bytes}, Next{nullptr} {} // Compute new "Next", try to bump if there's space, else return nullptr. void *tryAllocate(size_t N, size_t Align = 1); }; // Tail-allocated data for "big" allocations. struct BumpSeparate { std::atomic<BumpSeparate *> Next; // tail-allocated data of appropriate and alignment, using malloc.... }; // Allocator. struct BumpAllocator { std::atomic<BumpBlock *> CurrentBlock; std::atomic<BumpSeparate *> LastAllocSeparately; // Delete everything since there's no ownership here... ~BumpAllocator(); void *allocate(size_t N, size_t Align = 1) { if (N > 2048) return allocateSeparately(N, Align); BumpBlock *B = CurrentBlock; std::unique_ptr<BumpBlock> New; void *NewAlloc = nullptr; while (true) { if (LLVM_LIKELY(B)) if (void *Alloc = B->tryAllocate(N, Align)) return Alloc; if (!New) { New = new BumpBlock; NewAlloc = New->tryAllocate(N, Align); assert(NewAlloc && "Empty block doesn't have space!"); } if (!CurrentBlock.compare_exchange_weak(B, New.get())) continue; // New was saved in CurrentBlock. Fix its "Next" pointer and release it // so it's not deallocated. New->Next = B; New.release(); return NewAlloc; } } private: // Maintain a list of "big" allocations, similar to above. void *allocateSeparately(size_t N, size_t Align); };Not saying it needs to block this review...
But having a fast concurrent BumpPtrAllocator would be independently useful, and I'd suggest optimizing the allocator before bloating the default trie size.
Yes, the FIXME for a ThreadSafeBumpPtrAllocator is still there. Currently, I don't think it is urgent to fix. It is not expected to have someone to use Trie as a high performance thread safe set/map.
The immediate use case is as a high performance thread-safe data store. In the CAS use case we're expecting a lot of duplicate insertions which don't happen to hit the allocator, but I'm not sure that's really been checked/measured.
The algorithm works. Your sketch can have a small allocation in a slab when a conflict happens but that should be easy to fix.
The reason I don't want to have the CAS blocked by a better allocator is that I need to write and test it, also figure out if it needs to have any code sharing with regular BumpPtrAllocator.
Seems reasonable. Can be done later.
Also the other approach mentioned in the true review (with thread-local variables) could be interesting to benchmark. Would use more memory, but maybe that’s okay, and it has the advantage of being super simple just like the lock approach.
@benlangmuir did you have outstanding feedback on this patch? Otherwise it's fine by me, I think.
llvm/include/llvm/Support/ThreadSafeAllocator.h | ||
---|---|---|
39–47 | Might be nice to implement these with applyLocked - to keep the locking in one place - not that three short places close together are a real problem | |
49–52 | If this is in a template/header anyway, and pretty short, maybe make it a template that takes a functor, rather than adding the overhead of llvm::function_refs type erasure? | |
llvm/unittests/Support/ThreadSafeAllocatorTest.cpp | ||
39 | This name's a bit confusing to me - as this doesn't test or set the IsBusy flag, but has Busy in the name. Make slightly longer or more descriptive names would make it clear what each of these member functions is for/does? |
llvm/include/llvm/Support/ThreadSafeAllocator.h | ||
---|---|---|
39–47 | +1 for just delegating to applyLocked (if we make it a function template) rather than delegating to something that uses type erasure IMO. | |
49–52 | +1 for making this a function template | |
llvm/unittests/Support/ThreadSafeAllocatorTest.cpp | ||
39 | +1 as well re the naming/confusion to a new reader of the code (such as myself) |
"an unfair lock" should say "a spin lock". You can have blocking unfair locks, it just means they don't provide guarantees against starvation under contention. Similarly the paragraph below should not talk about fair/unfair since that's not the right distinction.