Page MenuHomePhabricator

[Support] Introduce ThreadSafeAllocator
AcceptedPublic

Authored by steven_wu on Sep 12 2022, 10:40 AM.

Details

Summary

Add support for ThreadSafeAllocator, which is needed for a CAS
implementation, which requires thread safe allocation for data storage.

Diff Detail

Unit TestsFailed

TimeTest
60,220 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded::vloxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/16/include -nostdsysteminc -triple riscv64 -target-feature +v -target-feature +zfh -target-feature +experimental-zvfh -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded/vloxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -passes=mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded/vloxseg.c
60,180 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded::vluxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/16/include -nostdsysteminc -triple riscv64 -target-feature +v -target-feature +zfh -target-feature +experimental-zvfh -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded/vluxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -passes=mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded/vluxseg.c
60,190 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded::vloxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/16/include -nostdsysteminc -triple riscv64 -target-feature +v -target-feature +zfh -target-feature +experimental-zvfh -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded/vloxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -passes=mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded/vloxseg.c
60,220 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded::vluxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/16/include -nostdsysteminc -triple riscv64 -target-feature +v -target-feature +zfh -target-feature +experimental-zvfh -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded/vluxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -passes=mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded/vluxseg.c
60,060 msx64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/lib/fuzzer -m64 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/fuzzer/LeakTest.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/fuzzer-leak.test.tmp-LeakTest
View Full Test Results (6 Failed)

Event Timeline

steven_wu created this revision.Sep 12 2022, 10:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2022, 10:40 AM
steven_wu requested review of this revision.Sep 12 2022, 10:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2022, 10:40 AM
benlangmuir added inline comments.Sep 15 2022, 11:51 AM
llvm/include/llvm/Support/ThreadSafeAllocator.h
20

"an unfair lock" should say "a spin lock". You can have blocking unfair locks, it just means they don't provide guarantees against starvation under contention. Similarly the paragraph below should not talk about fair/unfair since that's not the right distinction.

39

I'm a bit skeptical of = 1. I think this comes from SpecificBumpPtrAllocator, where the memory is typed, but for e.g. ThreadSafeAllocator<BumpPtrAllocator> this will allocate a single byte. I would just drop the default value, but if you want to keep it maybe do some sfinae magic to only apply on SpecificBumpPtrAllocator?

testing might be more robust if a mock allocator were used underneath this allocator? that mock allocator could block/wait to intentionally induce contention and demonstrate that the threads do block/wait appropriately?

llvm/include/llvm/Support/ThreadSafeAllocator.h
26

?

aganea added a subscriber: aganea.Sep 27 2022, 8:15 AM
steven_wu updated this revision to Diff 486642.Jan 5 2023, 11:39 AM

Address review feedback

steven_wu updated this revision to Diff 486648.Jan 5 2023, 12:19 PM

Fix non-assert build test

dblaikie added inline comments.Jan 5 2023, 1:47 PM
llvm/unittests/Support/ThreadSafeAllocatorTest.cpp
28

timers like this risk flaky failures if the system is under load such that the delay is inadequate to induce contention - they also make tests unnecessarily slow. Could you do this with locking/flags instead, ensuring that events occur as required?

@dexonsmith Move the discussion here.

Here's a sketch that I think mostly works?

// Block for bump allocations.
struct BumpBlock {
  std::atomic<char *> Ptr;
  std::atomic<BumpBlock *> Next;
  char Bytes[4084];
  BumpBlock() : Ptr{Bytes}, Next{nullptr} {}

  // Compute new "Next", try to bump if there's space, else return nullptr.
  void *tryAllocate(size_t N, size_t Align = 1);
};

// Tail-allocated data for "big" allocations.
struct BumpSeparate {
  std::atomic<BumpSeparate *> Next;
  // tail-allocated data of appropriate and alignment, using malloc....
};

// Allocator.
struct BumpAllocator {
  std::atomic<BumpBlock *> CurrentBlock;
  std::atomic<BumpSeparate *> LastAllocSeparately;

  // Delete everything since there's no ownership here...
  ~BumpAllocator();

  void *allocate(size_t N, size_t Align = 1) {
    if (N > 2048)
      return allocateSeparately(N, Align);

    BumpBlock *B = CurrentBlock;
    std::unique_ptr<BumpBlock> New;
    void *NewAlloc = nullptr;
    while (true) {
      if (LLVM_LIKELY(B))
        if (void *Alloc = B->tryAllocate(N, Align))
          return Alloc;

      if (!New) {
        New = new BumpBlock;
        NewAlloc = New->tryAllocate(N, Align);
        assert(NewAlloc && "Empty block doesn't have space!");
      }
      if (!CurrentBlock.compare_exchange_weak(B, New.get()))
        continue;

      // New was saved in CurrentBlock. Fix its "Next" pointer and release it
      // so it's not deallocated.
      New->Next = B;
      New.release();
      return NewAlloc;
    }
  }

private:
  // Maintain a list of "big" allocations, similar to above.
  void *allocateSeparately(size_t N, size_t Align);
};

Not saying it needs to block this review...

But having a fast concurrent BumpPtrAllocator would be independently useful, and I'd suggest optimizing the allocator before bloating the default trie size.

Yes, the FIXME for a ThreadSafeBumpPtrAllocator is still there. Currently, I don't think it is urgent to fix. It is not expected to have someone to use Trie as a high performance thread safe set/map.
The immediate use case is as a high performance thread-safe data store. In the CAS use case we're expecting a lot of duplicate insertions which don't happen to hit the allocator, but I'm not sure that's really been checked/measured.

The algorithm works. Your sketch can have a small allocation in a slab when a conflict happens but that should be easy to fix.

The reason I don't want to have the CAS blocked by a better allocator is that I need to write and test it, also figure out if it needs to have any code sharing with regular BumpPtrAllocator.

steven_wu updated this revision to Diff 486932.Jan 6 2023, 10:43 AM

Update unittest to wait and check.

The reason I don't want to have the CAS blocked by a better allocator is that I need to write and test it, also figure out if it needs to have any code sharing with regular BumpPtrAllocator.

Seems reasonable. Can be done later.

Also the other approach mentioned in the true review (with thread-local variables) could be interesting to benchmark. Would use more memory, but maybe that’s okay, and it has the advantage of being super simple just like the lock approach.

dblaikie accepted this revision.Tue, Jan 10, 3:57 PM

@benlangmuir did you have outstanding feedback on this patch? Otherwise it's fine by me, I think.

llvm/include/llvm/Support/ThreadSafeAllocator.h
39–47

Might be nice to implement these with applyLocked - to keep the locking in one place - not that three short places close together are a real problem

49–52

If this is in a template/header anyway, and pretty short, maybe make it a template that takes a functor, rather than adding the overhead of llvm::function_refs type erasure?

llvm/unittests/Support/ThreadSafeAllocatorTest.cpp
39

This name's a bit confusing to me - as this doesn't test or set the IsBusy flag, but has Busy in the name.

Make slightly longer or more descriptive names would make it clear what each of these member functions is for/does?

This revision is now accepted and ready to land.Tue, Jan 10, 3:57 PM
jloser added a subscriber: jloser.Mon, Jan 16, 2:04 PM
jloser added inline comments.
llvm/include/llvm/Support/ThreadSafeAllocator.h
39–47

+1 for just delegating to applyLocked (if we make it a function template) rather than delegating to something that uses type erasure IMO.

49–52

+1 for making this a function template

llvm/unittests/Support/ThreadSafeAllocatorTest.cpp
39

+1 as well re the naming/confusion to a new reader of the code (such as myself)

steven_wu updated this revision to Diff 489862.Tue, Jan 17, 9:41 AM
steven_wu marked an inline comment as done.

Address review feedback

Formatting and minor update.