This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
DWARFLinkerParallel/
-
StringPool.h
-
Support/
-
Parallel.h
4
PerThreadBumpPtrAllocator.h
-
lib/Support/
-
Support/
2
Parallel.cpp
-
unittests/
-
ADT/
-
ConcurrentHashtableTest.cpp
-
Support/
-
CMakeLists.txt
2
PerThreadBumpPtrAllocatorTest.cpp

Differential D142318

[Support] Add PerThreadBumpPtrAllocator class.
ClosedPublic

Authored by avl on Jan 22 2023, 3:12 PM.

Download Raw Diff

Details

Reviewers

MaskRay
dblaikie
dexonsmith
steven_wu
andrewng

Commits

rG6ab43f9b87ce: [Support] Add PerThreadBumpPtrAllocator class.

Summary

PerThreadBumpPtrAllocator allows separating allocations by thread id.
That makes allocations race free. It is possible because
ThreadPoolExecutor class creates threads, keeps them until
the destructor of ThreadPoolExecutor is called, and assigns ids
to the threads. Thus PerThreadBumpPtrAllocator should be used with only
threads created by ThreadPoolExecutor. This allocator is useful when
thread safe BumpPtrAllocator is needed.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

avl created this revision.Jan 22 2023, 3:12 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 22 2023, 3:12 PM

avl requested review of this revision.Jan 22 2023, 3:12 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 22 2023, 3:12 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B209257: Diff 491207.Jan 22 2023, 3:44 PM

It's nice how simple this is!

Thus ThreadPoolAllocator should be used with only
threads created by ThreadPoolExecutor.

I hadn't thought deeply about this limitation previously...

This seems fine for many APIs. But, I don't think the CAS could use this, since clients might share a CAS between multiple threads NOT owned by ThreadPoolAllocator. For the CAS, probably we need to stick with https://reviews.llvm.org/D133713 for now (probably eventually use a lock-free ConcurrentBumpPtrAllocator instead).

I agree that this probably doesn't work for CAS since the use of CAS is not bounded to any context like a ThreadPoolExecutor, for example, it is currently a legal use case to have multiple thread pool to insert into CAS at the same time. It is not feasible to put such a restriction on CAS since CAS should be safe to read/write concurrently from different process.

For this PoolAllocator, it might be a good idea to require it to be initialized with an instance of ThreadPoolExecutor and maybe add a method for Thread to get the context of its Executor so you can double check the allocator is used in the correct Exectuor when assertion is on.

avl mentioned this in D142317: [Support] Avoid using main thread for llvm::parallelFor()..Jan 23 2023, 2:54 PM

In D142318#4074207, @steven_wu wrote:

I agree that this probably doesn't work for CAS since the use of CAS is not bounded to any context like a ThreadPoolExecutor, for example, it is currently a legal use case to have multiple thread pool to insert into CAS at the same time. It is not feasible to put such a restriction on CAS since CAS should be safe to read/write concurrently from different process.

For this PoolAllocator, it might be a good idea to require it to be initialized with an instance of ThreadPoolExecutor and maybe add a method for Thread to get the context of its Executor so you can double check the allocator is used in the correct Exectuor when assertion is on.

I see, My first intention was to implement some similar solution. But it requires an additional refactoring(f.e. ThreadPoolExecutor is not currently visible). This patch is simple working implementation allowing to check the idea.

I was thinking about something like that:

class ThreadPoolAllocator {
  void *Allocate(size_t size, size_t alignment);
};

class ThreadPool {
  std::unique_ptr<ThreadPoolAllocator> getAllocator ();
};

class ThreadPoolExecutor {
  std::unique_ptr<ThreadPoolAllocator> getAllocator ();
};

avl mentioned this in D140841: [DWARFLinkerParallel] Add StringPool class..Jan 25 2023, 7:19 AM

rebased. renamed.

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 26 2023, 6:50 AM

avl retitled this revision from [Support] Add ThreadPoolAllocator class. to [Support] Add PerThreadBumpPtrAllocator class..Jan 26 2023, 6:52 AM

avl edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B210119: Diff 492435.Jan 26 2023, 7:41 AM

rebased.

Harbormaster completed remote builds in B215953: Diff 500447.Feb 25 2023, 1:47 PM

avl added a child revision: D132548: [WIP][ADT] Utility for comparision of hashtables implementation..Feb 26 2023, 7:45 AM

avl mentioned this in D147649: [Reland][ADT][ConcurrentHashTable] adapt ConcurrentHashTable and its users to LLVM_ENABLE_THREADS=0 mode..Apr 6 2023, 5:21 AM

avl mentioned this in rG33c442118f3e: [Reland][ADT][ConcurrentHashTable] adapt ConcurrentHashTable and its users to….Apr 12 2023, 2:36 AM

rebased.

Harbormaster completed remote builds in B225111: Diff 512887.Apr 12 2023, 11:15 AM

@MaskRay @andrewng Would you mind to take a look at this review, please? I need thread safe BumpPtrAllocator to use with parallelFor().

Will read through this patch soon! Back from a long vacation...

MaskRay added inline comments.Apr 12 2023, 3:17 PM

llvm/include/llvm/Support/PerThreadBumpPtrAllocator.h
32	initialize `Allocators` instead of initializing it to empty then calling `resize`
37	Delete this comment and the blank line above. One `// Pull in base class overloads.` suffices. Actually the code self complains, so I personally prefer removing the comment completely.
95	Since this cannot be resized, we can use `unique_ptr<AllocatorTy[]>` and a separate size variable.
llvm/lib/Support/Parallel.cpp
164	A thread ID is smaller than this number, so the name seems confusing. Use `getNumThreads`?

addressed comments. added getThreadLocalAllocator().
grouped methods on whether it could be or not called asynchronously.

Harbormaster completed remote builds in B225369: Diff 513255.Apr 13 2023, 9:27 AM

I have a suggested comment and some nits, otherwise looks good. It'd be good to have @andrewng's review as well.

llvm/include/llvm/Support/PerThreadBumpPtrAllocator.h
19	Suggest: PerThreadAllocator is used in conjunction with ThreadPoolExecutor to allow per-thread allocations. It wraps a possibly thread-unsafe allocator, e.g. BumpPtrAllocator. PerThreadAllocator must be used inside ThreadPoolExecutor as it utilizes getThreadIndex, which is set by ThreadPoolExecutor threads.
llvm/unittests/Support/PerThreadBumpPtrAllocatorTest.cpp
22	delete blank line
42	5000 seems too much. 1000 suffices.

This revision is now accepted and ready to land.Apr 13 2023, 1:25 PM

I still have a general concern: this utility isn't safe to use in general LLVM library code, and while that's documented in the header, there's nothing enforcing that or checking for it. I think it'd be easy to get this wrong, and our existing test coverage would be unlikely to catch mistakes, but it could be a big problem for tools/libraries that have their own thread pools and depend on LLVM code.

Is there any way of adding assertions, or a structural change, to confirm this hasn't been used in the wrong place? I think it would be okay to add a bit of API surface to ThreadPoolExecutor, or add some new fields under LLVM_ENABLE_API_BREAKING_CHECKS.

In D142318#4266388, @dexonsmith wrote:

I still have a general concern: this utility isn't safe to use in general LLVM library code, and while that's documented in the header, there's nothing enforcing that or checking for it. I think it'd be easy to get this wrong, and our existing test coverage would be unlikely to catch mistakes, but it could be a big problem for tools/libraries that have their own thread pools and depend on LLVM code.

Is there any way of adding assertions, or a structural change, to confirm this hasn't been used in the wrong place? I think it would be okay to add a bit of API surface to ThreadPoolExecutor, or add some new fields under LLVM_ENABLE_API_BREAKING_CHECKS.

the possible solution might be initializing threadIndex to some unrelated value by default.
f.e. setting threadIndex to -1; Threads created by ThreadPoolExecutor would have indexes in range 0 ... ThreadsNum.
It will trigger assertions "assert(getThreadIndex() < NumOfAllocators);" for wrong threads inside PerThreadAllocator methods. Does it sound OK?

In D142318#4266426, @avl wrote:

In D142318#4266388, @dexonsmith wrote:

I still have a general concern: this utility isn't safe to use in general LLVM library code, and while that's documented in the header, there's nothing enforcing that or checking for it. I think it'd be easy to get this wrong, and our existing test coverage would be unlikely to catch mistakes, but it could be a big problem for tools/libraries that have their own thread pools and depend on LLVM code.

Is there any way of adding assertions, or a structural change, to confirm this hasn't been used in the wrong place? I think it would be okay to add a bit of API surface to ThreadPoolExecutor, or add some new fields under LLVM_ENABLE_API_BREAKING_CHECKS.

the possible solution might be initializing threadIndex to some unrelated value by default.
f.e. setting threadIndex to -1; Threads created by ThreadPoolExecutor would have indexes in range 0 ... ThreadsNum.
It will trigger assertions "assert(getThreadIndex() < NumOfAllocators);" for wrong threads inside PerThreadAllocator methods. Does it sound OK?

When a llvm::parallel::parallel_* function returns, the ThreadPoolExecutor doesn't reset llvm::parallel::threadIndex. So the check won't be effective for a PerThreadBumpPtrAllocator misuse after a parallel_* function call.

Any expensive check mechanism should also support nested llvm::parallel::parallel_* calls (even if the inner calls are serial).

I don't have a good approach off my head...

In D142318#4266426, @avl wrote:

the possible solution might be initializing threadIndex to some unrelated value by default.
f.e. setting threadIndex to -1; Threads created by ThreadPoolExecutor would have indexes in range 0 ... ThreadsNum.
It will trigger assertions "assert(getThreadIndex() < NumOfAllocators);" for wrong threads inside PerThreadAllocator methods. Does it sound OK?

Sounds reasonable to me. Maybe we'd want to keep zero-init for non-assertions builds to avoid unnecessary static initializers?

In D142318#4266443, @MaskRay wrote:

When a llvm::parallel::parallel_* function returns, the ThreadPoolExecutor doesn't reset llvm::parallel::threadIndex. So the check won't be effective for a PerThreadBumpPtrAllocator misuse after a parallel_* function call.

I don't think we need to catch every possible failure. It sounds useful (even if not exhaustive) to catch misuse in the cases where no llvm::parallel code has been called yet. Then, regular unit test coverage for such library code can trigger assertion failures....

Any expensive check mechanism

Note, LLVM_ABI_BREAKING_CHECKS (spelling?) is on-by-default in assertions builds. This is different from LLVM_EXPENSIVE_CHECKS (sp?). I think if we can't make the assertion cheap enough to be on-by-default in assertions builds it probably isn't worth it.

(Not sure @avl's suggestion is ABI-breaking anyway so it's probably moot.)

BTW, if others feel strongly that such an assertion wouldn't be useful (say, maybe there's reason to believe that even unit tests wouldn't trigger it in practice due to @MaskRay's points?), happy to back away and let this land without it.

In D142318#4266443, @MaskRay wrote:

When a llvm::parallel::parallel_* function returns, the ThreadPoolExecutor doesn't reset llvm::parallel::threadIndex. So the check won't be effective for a PerThreadBumpPtrAllocator misuse after a parallel_* function call.

If it would not catch all the cases, probably it still would be useful if catch some of them.
f.e. following case would be coaght:

PerThreadBumpPtrAllocator Allocator;

parallelFor() {
   Allocator.Allocate();
}

Allocator.Allocate();   << the call is done on main thread.
                        << assertion should be raised.

PerThreadBumpPtrAllocator Allocator;
ThreadPool Pool(parallel::strategy);
Pool.async([&]() { Allocator.Allocate(); });  << the call is done on the thread created by not ThreadPoolExecutor. 
                                              << assertion should be raised.

Can it be useful?

It also could probably be done in more general way:

#if LLVM_ENABLE_ASSERTIONS
thread_local unsigned threadIndex = -1;
#else
thread_local unsigned threadIndex;
#endif
inline unsigned getThreadIndex() { 
  assert(threadIndex != -1);
  return threadIndex;
}

In D142318#4267954, @avl wrote:

It also could probably be done in more general way:

#if LLVM_ENABLE_ASSERTIONS
thread_local unsigned threadIndex = -1;
#else
thread_local unsigned threadIndex;
#endif
inline unsigned getThreadIndex() { 
  assert(threadIndex != -1);
  return threadIndex;
}

However, above suggestion forbids usage of the main thread which is currently used by parallelFor(). Thus the above check either should not be done either parallelFor() should be refactored to not use main thread.

It also could probably be done in more general way:
#if LLVM_ENABLE_ASSERTIONS
thread_local unsigned threadIndex = -1;
#else
thread_local unsigned threadIndex;
#endif
inline unsigned getThreadIndex() { 
  assert(threadIndex != -1);
  return threadIndex;
}
However, above suggestion forbids usage of the main thread which is currently used by parallelFor(). Thus the above check either should not be done either parallelFor() should be refactored to not use main thread.

Yes, any checking would need to somehow handle the case where parallelFor() falls back to using the main thread or this optimisation would need to be dropped. Also, IIRC, some of the other parallel methods may/will still use the main thread concurrently with threads from the thread pool.

I don't really have much time right now but I've scanned the change and it LGTM barring this concern over "safety of usage" which I think would be good to have if it can be done relatively easily.

addressed comments.

so far I suggest to implement safety check as a separate patch. After having a good solution for this.

In D142318#4268729, @avl wrote:

so far I suggest to implement safety check as a separate patch. After having a good solution for this.

Looks good to me.

Harbormaster completed remote builds in B225647: Diff 513631.Apr 14 2023, 10:12 AM

In D142318#4268765, @MaskRay wrote:

In D142318#4268729, @avl wrote:

so far I suggest to implement safety check as a separate patch. After having a good solution for this.

Looks good to me.

I think if we don't add the check now it's unlikely to happen later.

In D142318#4268416, @andrewng wrote:
It also could probably be done in more general way:
#if LLVM_ENABLE_ASSERTIONS
thread_local unsigned threadIndex = -1;
#else
thread_local unsigned threadIndex;
#endif
inline unsigned getThreadIndex() { 
  assert(threadIndex != -1);
  return threadIndex;
}
However, above suggestion forbids usage of the main thread which is currently used by parallelFor(). Thus the above check either should not be done either parallelFor() should be refactored to not use main thread.
Yes, any checking would need to somehow handle the case where parallelFor() falls back to using the main thread or this optimisation would need to be dropped. Also, IIRC, some of the other parallel methods may/will still use the main thread concurrently with threads from the thread pool.

Seems like threads are assigned IDs from 1 in the ThreadPoolExecutor constructor via calls to work(). The main thread assigns threadIndex to 0 in the same place:

explicit ThreadPoolExecutor(ThreadPoolStrategy S = hardware_concurrency()) {
  unsigned ThreadCount = S.compute_thread_count();
  // Spawn all but one of the threads in another thread as spawning threads
  // can take a while.
  Threads.reserve(ThreadCount);
  Threads.resize(1);
  std::lock_guard<std::mutex> Lock(Mutex);
  Threads[0] = std::thread([this, ThreadCount, S] {
    for (unsigned I = 1; I < ThreadCount; ++I) {
      Threads.emplace_back([=] { work(S, I); });
      if (Stop)
        break;
    }
    ThreadsCreated.set_value();
    work(S, 0);
  });
}

void work(ThreadPoolStrategy S, unsigned ThreadID) {
  threadIndex = ThreadID;

Can you explain what will go wrong with the main thread?

In D142318#4269070, @dexonsmith wrote:

In D142318#4268765, @MaskRay wrote:

In D142318#4268729, @avl wrote:

so far I suggest to implement safety check as a separate patch. After having a good solution for this.

Looks good to me.

I think if we don't add the check now it's unlikely to happen later.

(Although I do think an independent patch makes sense so it can be reverted independently if it causes trouble.)

In D142318#4269070, @dexonsmith wrote:

Seems like threads are assigned IDs from 1 in the ThreadPoolExecutor constructor via calls to work(). The main thread assigns threadIndex to 0 in the same place:

Aha, looks like I misread the code. The work() calls are coming from within a lambda that's executed by the first created thread. So, right now, the main thread has the same threadIndex as the first spawned thread.

(But if that's the case, doesn't that cause a problem for the allocator? Doesn't the allocator require that the main thread has a different ID from the worker threads?)

Assuming it's okay for the main thread to alias the worker threads, what if we just check if the ThreadPoolExecutor has been constructed?

// Parallel.h
bool hasDefaultExecutor();
unsigned getThreadIndex() {
  assert(hasDefaultExecutor());
  return threadIndex;
}

// Parallel.cpp
static std::atomic<bool> HasDefaultExecutor = false;
bool hasDefaultExecutor() {
  return HasDefaultExecutor;
}

Executor *Executor::getDefaultExecutor() {
  HasDefaultExecutor = true;

In D142318#4269161, @dexonsmith wrote:

In D142318#4269070, @dexonsmith wrote:

Seems like threads are assigned IDs from 1 in the ThreadPoolExecutor constructor via calls to work(). The main thread assigns threadIndex to 0 in the same place:

Aha, looks like I misread the code. The work() calls are coming from within a lambda that's executed by the first created thread. So, right now, the main thread has the same threadIndex as the first spawned thread.

(But if that's the case, doesn't that cause a problem for the allocator? Doesn't the allocator require that the main thread has a different ID from the worker threads?)

Specifically, if the main thread is doing work concurrently with Thread0, and won't they be using the same slice of the PerThreadAllocator?

In D142318#4269216, @dexonsmith wrote:

In D142318#4269161, @dexonsmith wrote:

In D142318#4269070, @dexonsmith wrote:

Seems like threads are assigned IDs from 1 in the ThreadPoolExecutor constructor via calls to work(). The main thread assigns threadIndex to 0 in the same place:

Aha, looks like I misread the code. The work() calls are coming from within a lambda that's executed by the first created thread. So, right now, the main thread has the same threadIndex as the first spawned thread.

(But if that's the case, doesn't that cause a problem for the allocator? Doesn't the allocator require that the main thread has a different ID from the worker threads?)

Specifically, if the main thread is doing work concurrently with Thread0, and won't they be using the same slice of the PerThreadAllocator?

My understanding is that it should not be so that the main thread is concurrently doing work with Thread0. If that is the case then the problem exists even without PerThreadAllocator. There was recently fixed a bug when exactly main thread doing work concurrently with Thread0 - D142317

Thus current assumption is that the main thread never works concurrently with Thread0. Because they never work concurrently it is OK to have slice 0 for both : main thread and Thread0.

In D142318#4269072, @dexonsmith wrote:

In D142318#4269070, @dexonsmith wrote:

In D142318#4268765, @MaskRay wrote:

In D142318#4268729, @avl wrote:

so far I suggest to implement safety check as a separate patch. After having a good solution for this.

Looks good to me.

I think if we don't add the check now it's unlikely to happen later.

(Although I do think an independent patch makes sense so it can be reverted independently if it causes trouble.)

I am OK to do that separate patch right after the current patch. Just do not have a good idea for this at the moment.

In D142318#4269644, @avl wrote:

I am OK to do that separate patch right after the current patch. Just do not have a good idea for this at the moment.

WDYT of the idea above, to have a Boolean flag that checks whether getDefaultExecutor() has been called, and assert on that in getThreadIndex()?

In D142318#4269662, @dexonsmith wrote:

In D142318#4269644, @avl wrote:

I am OK to do that separate patch right after the current patch. Just do not have a good idea for this at the moment.

WDYT of the idea above, to have a Boolean flag that checks whether getDefaultExecutor() has been called, and assert on that in getThreadIndex()?

I think this solves only part of the problem: it checks the fact that executor is already created when getThreadIndex() is requested. But it does not check that thread index is valid. If thread was created not by ThreadPoolExecutor then it would have zero index which clashes with thread index of main thread and Thread0. I thought we want to check that other threads were not used with getThreadIndex.

Checking ThreadPoolExecutor existence still useful check and it would be good to implement it. If we found a good way to check thread indexes it would also be useful.

In D142318#4269744, @avl wrote:

In D142318#4269662, @dexonsmith wrote:

In D142318#4269644, @avl wrote:

I am OK to do that separate patch right after the current patch. Just do not have a good idea for this at the moment.

WDYT of the idea above, to have a Boolean flag that checks whether getDefaultExecutor() has been called, and assert on that in getThreadIndex()?

I think this solves only part of the problem: it checks the fact that executor is already created when getThreadIndex() is requested. But it does not check that thread index is valid. If thread was created not by ThreadPoolExecutor then it would have zero index which clashes with thread index of main thread and Thread0. I thought we want to check that other threads were not used with getThreadIndex.

Checking ThreadPoolExecutor existence still useful check and it would be good to implement it. If we found a good way to check thread indexes it would also be useful.

Yeah, seems like a good start for now. This would catch the case where someone is NOT using llvm::parallel at all, but has a bunch of threads, and is wrongly assuming this allocator is safe for concurrent use in general.

Checking the thread indexes seems hard, since the "main" thread could be a different client thread on different entries to llvm::parallel.

In D142318#4270235, @dexonsmith wrote:

In D142318#4269744, @avl wrote:

I think this solves only part of the problem: it checks the fact that executor is already created when getThreadIndex() is requested. But it does not check that thread index is valid. If thread was created not by ThreadPoolExecutor then it would have zero index which clashes with thread index of main thread and Thread0. I thought we want to check that other threads were not used with getThreadIndex.

Checking ThreadPoolExecutor existence still useful check and it would be good to implement it. If we found a good way to check thread indexes it would also be useful.

Yeah, seems like a good start for now. This would catch the case where someone is NOT using llvm::parallel at all, but has a bunch of threads, and is wrongly assuming this allocator is safe for concurrent use in general.

This check will help for pure users of getThreadIndex() but will not help users of PerThreadBumpPtrAllocator as it calls "detail::Executor::getDefaultExecutor()->getThreadsNum();" in the constructor. Thus any call to getThreadIndex() after PerThreadBumpPtrAllocator is created will have HasDefaultExecutor == true.

Checking the thread indexes seems hard, since the "main" thread could be a different client thread on different entries to llvm::parallel.

In D142318#4270943, @avl wrote:

In D142318#4270235, @dexonsmith wrote:

In D142318#4269744, @avl wrote:

I think this solves only part of the problem: it checks the fact that executor is already created when getThreadIndex() is requested. But it does not check that thread index is valid. If thread was created not by ThreadPoolExecutor then it would have zero index which clashes with thread index of main thread and Thread0. I thought we want to check that other threads were not used with getThreadIndex.

Checking ThreadPoolExecutor existence still useful check and it would be good to implement it. If we found a good way to check thread indexes it would also be useful.

Yeah, seems like a good start for now. This would catch the case where someone is NOT using llvm::parallel at all, but has a bunch of threads, and is wrongly assuming this allocator is safe for concurrent use in general.

This check will help for pure users of getThreadIndex() but will not help users of PerThreadBumpPtrAllocator as it calls "detail::Executor::getDefaultExecutor()->getThreadsNum();" in the constructor. Thus any call to getThreadIndex() after PerThreadBumpPtrAllocator is created will have HasDefaultExecutor == true.

I see; I've been missing that detail.

You could introduce a free function, getDefaultThreadsNum() to Parallel.h/cpp, which in assertions mode calls hardware_concurrency().compute_thread_count() (redundantly) without calling getDefaultExecutor(). WDYT?

In D142318#4271261, @dexonsmith wrote:

In D142318#4270943, @avl wrote:

This check will help for pure users of getThreadIndex() but will not help users of PerThreadBumpPtrAllocator as it calls "detail::Executor::getDefaultExecutor()->getThreadsNum();" in the constructor. Thus any call to getThreadIndex() after PerThreadBumpPtrAllocator is created will have HasDefaultExecutor == true.

I see; I've been missing that detail.

You could introduce a free function, getDefaultThreadsNum() to Parallel.h/cpp, which in assertions mode calls hardware_concurrency().compute_thread_count() (redundantly) without calling getDefaultExecutor(). WDYT?

Originally, I tried to avoid such duplication. As there is no guarantee that DefaultExecutor() uses exactly hardware_concurrency().compute_thread_count() threads. If the default executor strategy would be changed it is easy to forget to update getDefaultThreadsNum() function. Probably, it could be verified by adding more asserts though.
Anyway, I tend to lean towards initializing threadIndex with -1:

thread_local unsigned threadIndex = -1;
inline unsigned getThreadIndex() {
  assert((threadIndex != -1) || (parallel::strategy.ThreadsRequested == 1));
  return threadIndex;
}

This solution guarantees that getThreadIndex() is used with only threads created by ThreadPoolExecutor.
It also helps to increase the level of parallelization by allowing parallel execution of llvm::parallelFor().
Currently, there is a limitation to not having two TaskGroups running in parallel:

// Latch::sync() called by the dtor may cause one thread to block. If is a dead
// lock if all threads in the default executor are blocked. To prevent the dead
// lock, only allow the first TaskGroup to run tasks parallelly. In the scenario
// of nested parallel_for_each(), only the outermost one runs parallelly.
TaskGroup::TaskGroup() : Parallel(TaskGroupInstances++ == 0) {}

This is done to avoid nested llvm::parallelFor() but it also avoids parallel llvm::parallelFor():

ThreadPool Pool(parallel::strategy);
    
Pool.async([&]() { llvm::parallelFor(); });
Pool.async([&]() { llvm::parallelFor(); });

Pool.wait();

The above solution allows running llvm::parallelFor() parallelly without having a deadlock.
Using threadIndex will make only nested llvm::parallelFor() to be sequential and allow parallel llvm::parallelFor():

TaskGroup::TaskGroup() : Parallel(threadIndex == -1) {}

Another thing, is that using assertion to check threadIndex will forbid the calling of getThreadIndex() on the main thread.
It will break this code lld/ELF/Relocations.cpp:

// Deterministic parallellism needs sorting relocations which is unsuitable
// for -z nocombreloc. MIPS and PPC64 use global states which are not suitable
// for parallelism.
bool serial = !config->zCombreloc || config->emachine == EM_MIPS ||
              config->emachine == EM_PPC64;
parallel::TaskGroup tg;
for (ELFFileBase *f : ctx.objectFiles) {
  auto fn = [f]() {
    RelocationScanner scanner;
    for (InputSectionBase *s : f->getSections()) {
      if (s && s->kind() == SectionBase::Regular && s->isLive() &&
          (s->flags & SHF_ALLOC) &&
          !(s->type == SHT_ARM_EXIDX && config->emachine == EM_ARM))
        scanner.template scanSection<ELFT>(*s);
    }
  };
  if (serial)
    fn();   <<<<< called on the main thread, it calls getThreadIndex() inside.
  else
    tg.execute(fn);
}

// Both the main thread and thread pool index 0 use getThreadIndex()==0. Be
// careful that they don't concurrently run scanSections. When serial is
// true, fn() has finished at this point, so running execute is safe.

To solve this problem it is possible to add a sequential mode to TaskGroup, so that tasks marked with
"sequential" flags are always executed on Thread0. It makes them always executed in the same order.
It also will allow executing scanSection concurrently as no indexes overlap now.

tg.execute(fn, serial);

List of changes shortly:

initialiase threadIndex with -1 and add assertion:

thread_local unsigned threadIndex = -1;
inline unsigned getThreadIndex() {
  assert((threadIndex != -1) || (parallel::strategy.ThreadsRequested == 1));
  return threadIndex;
}

remove optimization for NumItems to avoid executing on the main thread.

void llvm::parallelFor(size_t Begin, size_t End,
                       llvm::function_ref<void(size_t)> Fn) {
  // If we have zero or one items, then do not incur the overhead of spinning up
  // a task group.  They are surprisingly expensive, and because they do not
  // support nested parallelism, a single entry task group can block parallel
  // execution underneath them.
#if LLVM_ENABLE_THREADS
  auto NumItems = End - Begin;
  if (NumItems > 1 && parallel::strategy.ThreadsRequested != 1) {  <<< delete check for NumItems

Use threadIndex to avoid nested parallelization:

TaskGroup::TaskGroup() : Parallel(threadIndex == -1) {}

Add sequential mode to TaskGroup forcing marked tasks to be executed on the same thread in sequential order.

What do you think about that approach? It looks like it could give a good level of checking: no other threads are allowed to call getThreadIndex() and helps to get more parallelization.

This sounds okay to me, but I admit I don't know llvm::parallel well enough to understand the implications.

@MaskRay, WDYT?

Anyway, I tend to lean towards initializing threadIndex with -1:

This looks good to me to avoid collision between the main thread and the thread pool thread 0. This also fulfills "remove zero initialization." in a comment in D142317.

It will break this code lld/ELF/Relocations.cpp:

if (serial)
  fn();   <<<<< called on the main thread, it calls getThreadIndex() inside.
else
  tg.execute(fn);

Yes. It seems that we need a serial tg.xxxxx that makes getThreadIndex() == 0, even if llvm::parallel::strategy requests more than one threads.

I haven't thought about other changes very clearly. The safeguard changes and discussions can go to another patch.

Sounds good; happy for this to land while you continue working on how to do the checking.

remove optimization for NumItems to avoid executing on the main thread.

Sorry but I'm quite busy right now, so haven't been able to give this much attention. I did notice this whilst scanning the updates to this review and IIRC this was quite an important optimisation for some use case (can't remember the details). So might want to take some care before removing it.

In D142318#4277110, @andrewng wrote:

remove optimization for NumItems to avoid executing on the main thread.

Sorry but I'm quite busy right now, so haven't been able to give this much attention. I did notice this whilst scanning the updates to this review and IIRC this was quite an important optimisation for some use case (can't remember the details). So might want to take some care before removing it.

yep. I am going to create an appropriate set of reviews, so that it would be possible to pay attention to each aspect separately.

avl mentioned this in D148728: [Support][Parallel] Add sequential mode to TaskGroup::spawn()..Apr 19 2023, 10:21 AM

avl mentioned this in D148916: [Support][Parallel] Initialize threadIndex and add assertion checking its usage..Apr 21 2023, 5:27 AM

avl mentioned this in D148984: [Support][Parallel] Change check for nested TaskGroups..Apr 22 2023, 2:19 AM

avl mentioned this in rGfea8c073561f: [Support][Parallel] Add sequential mode to TaskGroup::spawn()..Apr 26 2023, 4:53 AM

avl mentioned this in rG06b617064a99: [Support][Parallel] Change check for nested TaskGroups..May 4 2023, 2:32 AM

rebased.

Harbormaster completed remote builds in B230036: Diff 519567.May 4 2023, 11:45 AM

Apart from the minor suggestion, all the parallel changes LGTM.

llvm/lib/Support/Parallel.cpp
109	Perhaps change `getThreadsNum` -> `getNumThreads` or `getThreadCount`?

Thank you for the review!

Closed by commit rG6ab43f9b87ce: [Support] Add PerThreadBumpPtrAllocator class. (authored by avl). · Explain WhyMay 6 2023, 5:36 AM

This revision was automatically updated to reflect the committed changes.

avl added a commit: rG6ab43f9b87ce: [Support] Add PerThreadBumpPtrAllocator class..

Revision Contents

Path

Size

llvm/

include/

llvm/

DWARFLinkerParallel/

StringPool.h

40 lines

Support/

Parallel.h

3 lines

PerThreadBumpPtrAllocator.h

120 lines

lib/

Support/

Parallel.cpp

12 lines

unittests/

ADT/

ConcurrentHashtableTest.cpp

121 lines

Support/

CMakeLists.txt

1 line

PerThreadBumpPtrAllocatorTest.cpp

52 lines

Diff 513631

llvm/include/llvm/DWARFLinkerParallel/StringPool.h

	//===- StringPool.h ---------------------------------------------- C++ --===//			//===- StringPool.h ---------------------------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_DWARFLINKERPARALLEL_STRINGPOOL_H			#ifndef LLVM_DWARFLINKERPARALLEL_STRINGPOOL_H
	#define LLVM_DWARFLINKERPARALLEL_STRINGPOOL_H			#define LLVM_DWARFLINKERPARALLEL_STRINGPOOL_H

	#include "llvm/ADT/ConcurrentHashtable.h"			#include "llvm/ADT/ConcurrentHashtable.h"
	#include "llvm/CodeGen/DwarfStringPoolEntry.h"			#include "llvm/CodeGen/DwarfStringPoolEntry.h"
	#include "llvm/Support/Allocator.h"			#include "llvm/Support/Allocator.h"
				#include "llvm/Support/PerThreadBumpPtrAllocator.h"
	#include <string>			#include <string>
	#include <string_view>			#include <string_view>

	namespace llvm {			namespace llvm {
	namespace dwarflinker_parallel {			namespace dwarflinker_parallel {

	/// StringEntry keeps data of the string: the length, external offset			/// StringEntry keeps data of the string: the length, external offset
	/// and a string body which is placed right after StringEntry.			/// and a string body which is placed right after StringEntry.
	using StringEntry = StringMapEntry<DwarfStringPoolEntry *>;			using StringEntry = StringMapEntry<DwarfStringPoolEntry *>;

	class StringAllocator : public AllocatorBase<StringAllocator> {
	public:
	inline LLVM_ATTRIBUTE_RETURNS_NONNULL void *Allocate(size_t Size,
	size_t Alignment) {
	#if LLVM_ENABLE_THREADS
	std::lock_guard<std::mutex> Guard(AllocatorMutex);
	#endif

	return Allocator.Allocate(Size, Align(Alignment));
	}

	// Pull in base class overloads.
	using AllocatorBase<StringAllocator>::Allocate;

	private:
	#if LLVM_ENABLE_THREADS
	std::mutex AllocatorMutex;
	#endif

	BumpPtrAllocator Allocator;
	};

	class StringPoolEntryInfo {			class StringPoolEntryInfo {
	public:			public:
	/// \returns Hash value for the specified \p Key.			/// \returns Hash value for the specified \p Key.
	static inline uint64_t getHashValue(const StringRef &Key) {			static inline uint64_t getHashValue(const StringRef &Key) {
	return xxHash64(Key);			return xxHash64(Key);
	}			}

	/// \returns true if both \p LHS and \p RHS are equal.			/// \returns true if both \p LHS and \p RHS are equal.
	static inline bool isEqual(const StringRef &LHS, const StringRef &RHS) {			static inline bool isEqual(const StringRef &LHS, const StringRef &RHS) {
	return LHS == RHS;			return LHS == RHS;
	}			}

	/// \returns key for the specified \p KeyData.			/// \returns key for the specified \p KeyData.
	static inline StringRef getKey(const StringEntry &KeyData) {			static inline StringRef getKey(const StringEntry &KeyData) {
	return KeyData.getKey();			return KeyData.getKey();
	}			}

	/// \returns newly created object of KeyDataTy type.			/// \returns newly created object of KeyDataTy type.
	static inline StringEntry *create(const StringRef &Key,			static inline StringEntry *
	StringAllocator &Allocator) {			create(const StringRef &Key, parallel::PerThreadBumpPtrAllocator &Allocator) {
	return StringEntry::create(Key, Allocator);			return StringEntry::create(Key, Allocator);
	}			}
	};			};

	class StringPool			class StringPool
	: public ConcurrentHashTableByPtr<StringRef, StringEntry, StringAllocator,			: public ConcurrentHashTableByPtr<StringRef, StringEntry,
				parallel::PerThreadBumpPtrAllocator,
	StringPoolEntryInfo> {			StringPoolEntryInfo> {
	public:			public:
	StringPool()			StringPool()
	: ConcurrentHashTableByPtr<StringRef, StringEntry, StringAllocator,			: ConcurrentHashTableByPtr<StringRef, StringEntry,
				parallel::PerThreadBumpPtrAllocator,
	StringPoolEntryInfo>(Allocator) {}			StringPoolEntryInfo>(Allocator) {}

	StringPool(size_t InitialSize)			StringPool(size_t InitialSize)
	: ConcurrentHashTableByPtr<StringRef, StringEntry, StringAllocator,			: ConcurrentHashTableByPtr<StringRef, StringEntry,
				parallel::PerThreadBumpPtrAllocator,
	StringPoolEntryInfo>(Allocator, InitialSize) {}			StringPoolEntryInfo>(Allocator, InitialSize) {}

	StringAllocator &getAllocatorRef() { return Allocator; }			parallel::PerThreadBumpPtrAllocator &getAllocatorRef() { return Allocator; }

	private:			private:
	StringAllocator Allocator;			parallel::PerThreadBumpPtrAllocator Allocator;
	};			};

	} // end of namespace dwarflinker_parallel			} // end of namespace dwarflinker_parallel
	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_DWARFLINKERPARALLEL_STRINGPOOL_H			#endif // LLVM_DWARFLINKERPARALLEL_STRINGPOOL_H

llvm/include/llvm/Support/Parallel.h

	Show All 34 Lines
	// possible with Windows Native TLS.			// possible with Windows Native TLS.
	unsigned getThreadIndex();			unsigned getThreadIndex();
	#else			#else
	// Don't access this directly, use the getThreadIndex wrapper.			// Don't access this directly, use the getThreadIndex wrapper.
	extern thread_local unsigned threadIndex;			extern thread_local unsigned threadIndex;

	inline unsigned getThreadIndex() { return threadIndex; }			inline unsigned getThreadIndex() { return threadIndex; }
	#endif			#endif

				size_t getNumThreads();
	#else			#else
	inline unsigned getThreadIndex() { return 0; }			inline unsigned getThreadIndex() { return 0; }
				inline size_t getNumThreads() { return 1; }
	#endif			#endif

	namespace detail {			namespace detail {
	class Latch {			class Latch {
	uint32_t Count;			uint32_t Count;
	mutable std::mutex Mutex;			mutable std::mutex Mutex;
	mutable std::condition_variable Cond;			mutable std::condition_variable Cond;

	▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

llvm/include/llvm/Support/PerThreadBumpPtrAllocator.h

This file was added.

				//===- PerThreadBumpPtrAllocator.h ------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_SUPPORT_PERTHREADBUMPPTRALLOCATOR_H
				#define LLVM_SUPPORT_PERTHREADBUMPPTRALLOCATOR_H

				#include "llvm/Support/Allocator.h"
				#include "llvm/Support/Parallel.h"

				namespace llvm {
				namespace parallel {

				/// PerThreadAllocator is used in conjunction with ThreadPoolExecutor to allow
				/// per-thread allocations. It wraps a possibly thread-unsafe allocator,
				MaskRayUnsubmitted Not Done Reply Inline Actions Suggest: PerThreadAllocator is used in conjunction with ThreadPoolExecutor to allow per-thread allocations. It wraps a possibly thread-unsafe allocator, e.g. BumpPtrAllocator. PerThreadAllocator must be used inside ThreadPoolExecutor as it utilizes getThreadIndex, which is set by ThreadPoolExecutor threads. MaskRay: Suggest: PerThreadAllocator is used in conjunction with ThreadPoolExecutor to allow per-thread…
				/// e.g. BumpPtrAllocator. PerThreadAllocator must be used with only main thread
				/// or threads created by ThreadPoolExecutor, as it utilizes getThreadIndex,
				/// which is set by ThreadPoolExecutor. To work properly, ThreadPoolExecutor
				/// should be initialized before PerThreadAllocator is created.
				/// TODO: The same approach might be implemented for ThreadPool.

				template <typename AllocatorTy>
				class PerThreadAllocator
				: public AllocatorBase<PerThreadAllocator<AllocatorTy>> {
				public:
				PerThreadAllocator()
				: NumOfAllocators(parallel::getNumThreads()),
				Allocators(std::make_unique<AllocatorTy[]>(NumOfAllocators)) {}
				MaskRayUnsubmitted Not Done Reply Inline Actions initialize `Allocators` instead of initializing it to empty then calling `resize` MaskRay: initialize `Allocators` instead of initializing it to empty then calling `resize`

				/// \defgroup Methods which could be called asynchronously:
				///
				/// @{

				MaskRayUnsubmitted Not Done Reply Inline Actions Delete this comment and the blank line above. One `// Pull in base class overloads.` suffices. Actually the code self complains, so I personally prefer removing the comment completely. MaskRay: Delete this comment and the blank line above. One `// Pull in base class overloads.` suffices.
				using AllocatorBase<PerThreadAllocator<AllocatorTy>>::Allocate;

				using AllocatorBase<PerThreadAllocator<AllocatorTy>>::Deallocate;

				/// Allocate \a Size bytes of \a Alignment aligned memory.
				void *Allocate(size_t Size, size_t Alignment) {
				assert(getThreadIndex() < NumOfAllocators);
				return Allocators[getThreadIndex()].Allocate(Size, Alignment);
				}

				/// Deallocate \a Ptr to \a Size bytes of memory allocated by this
				/// allocator.
				void Deallocate(const void *Ptr, size_t Size, size_t Alignment) {
				assert(getThreadIndex() < NumOfAllocators);
				return Allocators[getThreadIndex()].Deallocate(Ptr, Size, Alignment);
				}

				/// Return allocator corresponding to the current thread.
				AllocatorTy &getThreadLocalAllocator() {
				assert(getThreadIndex() < NumOfAllocators);
				return Allocators[getThreadIndex()];
				}

				// Return number of used allocators.
				size_t getNumberOfAllocators() const { return NumOfAllocators; }
				/// @}

				/// \defgroup Methods which could not be called asynchronously:
				///
				/// @{

				/// Reset state of allocators.
				void Reset() {
				for (size_t Idx = 0; Idx < getNumberOfAllocators(); Idx++)
				Allocators[Idx].Reset();
				}

				/// Return total memory size used by all allocators.
				size_t getTotalMemory() const {
				size_t TotalMemory = 0;

				for (size_t Idx = 0; Idx < getNumberOfAllocators(); Idx++)
				TotalMemory += Allocators[Idx].getTotalMemory();

				return TotalMemory;
				}

				/// Return allocated size by all allocators.
				size_t getBytesAllocated() const {
				size_t BytesAllocated = 0;

				for (size_t Idx = 0; Idx < getNumberOfAllocators(); Idx++)
				BytesAllocated += Allocators[Idx].getBytesAllocated();

				return BytesAllocated;
				}

				/// Set red zone for all allocators.
				MaskRayUnsubmitted Not Done Reply Inline Actions Since this cannot be resized, we can use `unique_ptr<AllocatorTy[]>` and a separate size variable. MaskRay: Since this cannot be resized, we can use `unique_ptr<AllocatorTy[]>` and a separate size…
				void setRedZoneSize(size_t NewSize) {
				for (size_t Idx = 0; Idx < getNumberOfAllocators(); Idx++)
				Allocators[Idx].setRedZoneSize(NewSize);
				}

				/// Print statistic for each allocator.
				void PrintStats() const {
				for (size_t Idx = 0; Idx < getNumberOfAllocators(); Idx++) {
				errs() << "\n Allocator " << Idx << "\n";
				Allocators[Idx].PrintStats();
				}
				}
				/// @}

				protected:
				size_t NumOfAllocators;
				std::unique_ptr<AllocatorTy[]> Allocators;
				};

				using PerThreadBumpPtrAllocator = PerThreadAllocator<BumpPtrAllocator>;

				} // end namespace parallel
				} // end namespace llvm

				#endif // LLVM_SUPPORT_PERTHREADBUMPPTRALLOCATOR_H

llvm/lib/Support/Parallel.cpp

Show All 34 Lines

namespace {		namespace {

/// An abstract class that takes closures and runs them asynchronously.		/// An abstract class that takes closures and runs them asynchronously.
class Executor {		class Executor {
public:		public:
virtual ~Executor() = default;		virtual ~Executor() = default;
virtual void add(std::function<void()> func) = 0;		virtual void add(std::function<void()> func) = 0;
		virtual size_t getThreadsNum() const = 0;

static Executor *getDefaultExecutor();		static Executor *getDefaultExecutor();
};		};

/// An implementation of an Executor that runs closures on a thread pool		/// An implementation of an Executor that runs closures on a thread pool
/// in filo order.		/// in filo order.
class ThreadPoolExecutor : public Executor {		class ThreadPoolExecutor : public Executor {
public:		public:
explicit ThreadPoolExecutor(ThreadPoolStrategy S = hardware_concurrency()) {		explicit ThreadPoolExecutor(ThreadPoolStrategy S = hardware_concurrency()) {
unsigned ThreadCount = S.compute_thread_count();		ThreadCount = S.compute_thread_count();
// Spawn all but one of the threads in another thread as spawning threads		// Spawn all but one of the threads in another thread as spawning threads
// can take a while.		// can take a while.
Threads.reserve(ThreadCount);		Threads.reserve(ThreadCount);
Threads.resize(1);		Threads.resize(1);
std::lock_guard<std::mutex> Lock(Mutex);		std::lock_guard<std::mutex> Lock(Mutex);
// Use operator[] before creating the thread to avoid data race in .size()		// Use operator[] before creating the thread to avoid data race in .size()
// in “safe libc++” mode.		// in “safe libc++” mode.
auto &Thread0 = Threads[0];		auto &Thread0 = Threads[0];
Thread0 = std::thread([this, ThreadCount, S] {		Thread0 = std::thread([this, S] {
for (unsigned I = 1; I < ThreadCount; ++I) {		for (unsigned I = 1; I < ThreadCount; ++I) {
Threads.emplace_back([=] { work(S, I); });		Threads.emplace_back([=] { work(S, I); });
if (Stop)		if (Stop)
break;		break;
}		}
ThreadsCreated.set_value();		ThreadsCreated.set_value();
work(S, 0);		work(S, 0);
});		});
Show All 30 Lines	public:
void add(std::function<void()> F) override {		void add(std::function<void()> F) override {
{		{
std::lock_guard<std::mutex> Lock(Mutex);		std::lock_guard<std::mutex> Lock(Mutex);
WorkStack.push(std::move(F));		WorkStack.push(std::move(F));
}		}
Cond.notify_one();		Cond.notify_one();
}		}

		size_t getThreadsNum() const override { return ThreadCount; }
		andrewngUnsubmitted Not Done Reply Inline Actions Perhaps change `getThreadsNum` -> `getNumThreads` or `getThreadCount`? andrewng: Perhaps change `getThreadsNum` -> `getNumThreads` or `getThreadCount`?

private:		private:
void work(ThreadPoolStrategy S, unsigned ThreadID) {		void work(ThreadPoolStrategy S, unsigned ThreadID) {
threadIndex = ThreadID;		threadIndex = ThreadID;
S.apply_thread_strategy(ThreadID);		S.apply_thread_strategy(ThreadID);
while (true) {		while (true) {
std::unique_lock<std::mutex> Lock(Mutex);		std::unique_lock<std::mutex> Lock(Mutex);
Cond.wait(Lock, [&] { return Stop \|\| !WorkStack.empty(); });		Cond.wait(Lock, [&] { return Stop \|\| !WorkStack.empty(); });
if (Stop)		if (Stop)
break;		break;
auto Task = std::move(WorkStack.top());		auto Task = std::move(WorkStack.top());
WorkStack.pop();		WorkStack.pop();
Lock.unlock();		Lock.unlock();
Task();		Task();
}		}
}		}

std::atomic<bool> Stop{false};		std::atomic<bool> Stop{false};
std::stack<std::function<void()>> WorkStack;		std::stack<std::function<void()>> WorkStack;
std::mutex Mutex;		std::mutex Mutex;
std::condition_variable Cond;		std::condition_variable Cond;
std::promise<void> ThreadsCreated;		std::promise<void> ThreadsCreated;
std::vector<std::thread> Threads;		std::vector<std::thread> Threads;
		unsigned ThreadCount;
};		};

Executor *Executor::getDefaultExecutor() {		Executor *Executor::getDefaultExecutor() {
// The ManagedStatic enables the ThreadPoolExecutor to be stopped via		// The ManagedStatic enables the ThreadPoolExecutor to be stopped via
// llvm_shutdown() which allows a "clean" fast exit, e.g. via _exit(). This		// llvm_shutdown() which allows a "clean" fast exit, e.g. via _exit(). This
// stops the thread pool and waits for any worker thread creation to complete		// stops the thread pool and waits for any worker thread creation to complete
// but does not wait for the threads to finish. The wait for worker thread		// but does not wait for the threads to finish. The wait for worker thread
// creation to complete is important as it prevents intermittent crashes on		// creation to complete is important as it prevents intermittent crashes on
Show All 13 Lines	Executor *Executor::getDefaultExecutor() {
static ManagedStatic<ThreadPoolExecutor, ThreadPoolExecutor::Creator,		static ManagedStatic<ThreadPoolExecutor, ThreadPoolExecutor::Creator,
ThreadPoolExecutor::Deleter>		ThreadPoolExecutor::Deleter>
ManagedExec;		ManagedExec;
static std::unique_ptr<ThreadPoolExecutor> Exec(&(*ManagedExec));		static std::unique_ptr<ThreadPoolExecutor> Exec(&(*ManagedExec));
return Exec.get();		return Exec.get();
}		}
} // namespace		} // namespace
} // namespace detail		} // namespace detail

		size_t getNumThreads() {
		MaskRayUnsubmitted Not Done Reply Inline Actions A thread ID is smaller than this number, so the name seems confusing. Use `getNumThreads`? MaskRay: A thread ID is smaller than this number, so the name seems confusing. Use `getNumThreads`?
		return detail::Executor::getDefaultExecutor()->getThreadsNum();
		}
#endif		#endif

static std::atomic<int> TaskGroupInstances;		static std::atomic<int> TaskGroupInstances;

// Latch::sync() called by the dtor may cause one thread to block. If is a dead		// Latch::sync() called by the dtor may cause one thread to block. If is a dead
// lock if all threads in the default executor are blocked. To prevent the dead		// lock if all threads in the default executor are blocked. To prevent the dead
// lock, only allow the first TaskGroup to run tasks parallelly. In the scenario		// lock, only allow the first TaskGroup to run tasks parallelly. In the scenario
// of nested parallel_for_each(), only the outermost one runs parallelly.		// of nested parallel_for_each(), only the outermost one runs parallelly.
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/unittests/ADT/ConcurrentHashtableTest.cpp

//===- ConcurrentHashtableTest.cpp ----------------------------------------===//		//===- ConcurrentHashtableTest.cpp ----------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ADT/ConcurrentHashtable.h"		#include "llvm/ADT/ConcurrentHashtable.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"
#include "llvm/Support/Parallel.h"		#include "llvm/Support/Parallel.h"
		#include "llvm/Support/PerThreadBumpPtrAllocator.h"
#include "gtest/gtest.h"		#include "gtest/gtest.h"
#include <limits>		#include <limits>
#include <random>		#include <random>
#include <vector>		#include <vector>
using namespace llvm;		using namespace llvm;
		using namespace parallel;

namespace {		namespace {
class String {		class String {
public:		public:
String() {}		String() {}
const std::string &getKey() const { return Data; }		const std::string &getKey() const { return Data; }

template <typename AllocatorTy>		template <typename AllocatorTy>
static String *create(const std::string &Num, AllocatorTy &Allocator) {		static String *create(const std::string &Num, AllocatorTy &Allocator) {
String *Result = Allocator.template Allocate<String>();		String *Result = Allocator.template Allocate<String>();
new (Result) String(Num);		new (Result) String(Num);
return Result;		return Result;
}		}

protected:		protected:
String(const std::string &Num) { Data += Num; }		String(const std::string &Num) { Data += Num; }

std::string Data;		std::string Data;
std::array<char, 0x20> ExtraData;		std::array<char, 0x20> ExtraData;
};		};

class SimpleAllocator : public AllocatorBase<SimpleAllocator> {
public:
inline LLVM_ATTRIBUTE_RETURNS_NONNULL void *Allocate(size_t Size,
size_t Alignment) {
#if LLVM_ENABLE_THREADS
std::lock_guard<std::mutex> Guard(AllocatorMutex);
#endif

return Allocator.Allocate(Size, Align(Alignment));
}
inline size_t getBytesAllocated() {
#if LLVM_ENABLE_THREADS
std::lock_guard<std::mutex> Guard(AllocatorMutex);
#endif

return Allocator.getBytesAllocated();
}

// Pull in base class overloads.
using AllocatorBase<SimpleAllocator>::Allocate;

protected:
#if LLVM_ENABLE_THREADS
std::mutex AllocatorMutex;
#endif
BumpPtrAllocator Allocator;
} Allocator;

TEST(ConcurrentHashTableTest, AddStringEntries) {		TEST(ConcurrentHashTableTest, AddStringEntries) {
ConcurrentHashTableByPtr<		PerThreadBumpPtrAllocator Allocator;
std::string, String, SimpleAllocator,		ConcurrentHashTableByPtr<std::string, String, PerThreadBumpPtrAllocator,
ConcurrentHashTableInfoByPtr<std::string, String, SimpleAllocator>>		ConcurrentHashTableInfoByPtr<
		std::string, String, PerThreadBumpPtrAllocator>>
HashTable(Allocator, 10);		HashTable(Allocator, 10);

size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();		size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();
std::pair<String *, bool> res1 = HashTable.insert("1");		std::pair<String *, bool> res1 = HashTable.insert("1");
// Check entry is inserted.		// Check entry is inserted.
EXPECT_TRUE(res1.first->getKey() == "1");		EXPECT_TRUE(res1.first->getKey() == "1");
EXPECT_TRUE(res1.second);		EXPECT_TRUE(res1.second);

Show All 29 Lines	TEST(ConcurrentHashTableTest, AddStringEntries) {
raw_string_ostream StatisticStream(StatisticString);		raw_string_ostream StatisticStream(StatisticString);
HashTable.printStatistic(StatisticStream);		HashTable.printStatistic(StatisticStream);

EXPECT_TRUE(StatisticString.find("Overall number of entries = 3\n") !=		EXPECT_TRUE(StatisticString.find("Overall number of entries = 3\n") !=
std::string::npos);		std::string::npos);
}		}

TEST(ConcurrentHashTableTest, AddStringMultiplueEntries) {		TEST(ConcurrentHashTableTest, AddStringMultiplueEntries) {
		PerThreadBumpPtrAllocator Allocator;
const size_t NumElements = 10000;		const size_t NumElements = 10000;
ConcurrentHashTableByPtr<		ConcurrentHashTableByPtr<std::string, String, PerThreadBumpPtrAllocator,
std::string, String, SimpleAllocator,		ConcurrentHashTableInfoByPtr<
ConcurrentHashTableInfoByPtr<std::string, String, SimpleAllocator>>		std::string, String, PerThreadBumpPtrAllocator>>
HashTable(Allocator);		HashTable(Allocator);

// Check insertion.		// Check insertion.
for (size_t I = 0; I < NumElements; I++) {		for (size_t I = 0; I < NumElements; I++) {
size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();		BumpPtrAllocator &ThreadLocalAllocator =
		Allocator.getThreadLocalAllocator();
		size_t AllocatedBytesAtStart = ThreadLocalAllocator.getBytesAllocated();
std::string StringForElement = formatv("{0}", I);		std::string StringForElement = formatv("{0}", I);
std::pair<String *, bool> Entry = HashTable.insert(StringForElement);		std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
EXPECT_TRUE(Entry.second);		EXPECT_TRUE(Entry.second);
EXPECT_TRUE(Entry.first->getKey() == StringForElement);		EXPECT_TRUE(Entry.first->getKey() == StringForElement);
EXPECT_TRUE(Allocator.getBytesAllocated() > AllocatedBytesAtStart);		EXPECT_TRUE(ThreadLocalAllocator.getBytesAllocated() >
		AllocatedBytesAtStart);
}		}

std::string StatisticString;		std::string StatisticString;
raw_string_ostream StatisticStream(StatisticString);		raw_string_ostream StatisticStream(StatisticString);
HashTable.printStatistic(StatisticStream);		HashTable.printStatistic(StatisticStream);

// Verifying that the table contains exactly the number of elements we		// Verifying that the table contains exactly the number of elements we
// inserted.		// inserted.
EXPECT_TRUE(StatisticString.find("Overall number of entries = 10000\n") !=		EXPECT_TRUE(StatisticString.find("Overall number of entries = 10000\n") !=
std::string::npos);		std::string::npos);

// Check insertion of duplicates.		// Check insertion of duplicates.
for (size_t I = 0; I < NumElements; I++) {		for (size_t I = 0; I < NumElements; I++) {
size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();		BumpPtrAllocator &ThreadLocalAllocator =
		Allocator.getThreadLocalAllocator();
		size_t AllocatedBytesAtStart = ThreadLocalAllocator.getBytesAllocated();
std::string StringForElement = formatv("{0}", I);		std::string StringForElement = formatv("{0}", I);
std::pair<String *, bool> Entry = HashTable.insert(StringForElement);		std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
EXPECT_FALSE(Entry.second);		EXPECT_FALSE(Entry.second);
EXPECT_TRUE(Entry.first->getKey() == StringForElement);		EXPECT_TRUE(Entry.first->getKey() == StringForElement);
// Check no additional bytes were allocated for duplicate.		// Check no additional bytes were allocated for duplicate.
EXPECT_TRUE(Allocator.getBytesAllocated() == AllocatedBytesAtStart);		EXPECT_TRUE(ThreadLocalAllocator.getBytesAllocated() ==
		AllocatedBytesAtStart);
}		}

// Check statistic.		// Check statistic.
// Verifying that the table contains exactly the number of elements we		// Verifying that the table contains exactly the number of elements we
// inserted.		// inserted.
EXPECT_TRUE(StatisticString.find("Overall number of entries = 10000\n") !=		EXPECT_TRUE(StatisticString.find("Overall number of entries = 10000\n") !=
std::string::npos);		std::string::npos);
}		}

TEST(ConcurrentHashTableTest, AddStringMultiplueEntriesWithResize) {		TEST(ConcurrentHashTableTest, AddStringMultiplueEntriesWithResize) {
		PerThreadBumpPtrAllocator Allocator;
// Number of elements exceeds original size, thus hashtable should be resized.		// Number of elements exceeds original size, thus hashtable should be resized.
const size_t NumElements = 20000;		const size_t NumElements = 20000;
ConcurrentHashTableByPtr<		ConcurrentHashTableByPtr<std::string, String, PerThreadBumpPtrAllocator,
std::string, String, SimpleAllocator,		ConcurrentHashTableInfoByPtr<
ConcurrentHashTableInfoByPtr<std::string, String, SimpleAllocator>>		std::string, String, PerThreadBumpPtrAllocator>>
HashTable(Allocator, 100);		HashTable(Allocator, 100);

// Check insertion.		// Check insertion.
for (size_t I = 0; I < NumElements; I++) {		for (size_t I = 0; I < NumElements; I++) {
size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();		BumpPtrAllocator &ThreadLocalAllocator =
		Allocator.getThreadLocalAllocator();
		size_t AllocatedBytesAtStart = ThreadLocalAllocator.getBytesAllocated();
std::string StringForElement = formatv("{0} {1}", I, I + 100);		std::string StringForElement = formatv("{0} {1}", I, I + 100);
std::pair<String *, bool> Entry = HashTable.insert(StringForElement);		std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
EXPECT_TRUE(Entry.second);		EXPECT_TRUE(Entry.second);
EXPECT_TRUE(Entry.first->getKey() == StringForElement);		EXPECT_TRUE(Entry.first->getKey() == StringForElement);
EXPECT_TRUE(Allocator.getBytesAllocated() > AllocatedBytesAtStart);		EXPECT_TRUE(ThreadLocalAllocator.getBytesAllocated() >
		AllocatedBytesAtStart);
}		}

std::string StatisticString;		std::string StatisticString;
raw_string_ostream StatisticStream(StatisticString);		raw_string_ostream StatisticStream(StatisticString);
HashTable.printStatistic(StatisticStream);		HashTable.printStatistic(StatisticStream);

// Verifying that the table contains exactly the number of elements we		// Verifying that the table contains exactly the number of elements we
// inserted.		// inserted.
EXPECT_TRUE(StatisticString.find("Overall number of entries = 20000\n") !=		EXPECT_TRUE(StatisticString.find("Overall number of entries = 20000\n") !=
std::string::npos);		std::string::npos);

// Check insertion of duplicates.		// Check insertion of duplicates.
for (size_t I = 0; I < NumElements; I++) {		for (size_t I = 0; I < NumElements; I++) {
size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();		BumpPtrAllocator &ThreadLocalAllocator =
		Allocator.getThreadLocalAllocator();
		size_t AllocatedBytesAtStart = ThreadLocalAllocator.getBytesAllocated();
std::string StringForElement = formatv("{0} {1}", I, I + 100);		std::string StringForElement = formatv("{0} {1}", I, I + 100);
std::pair<String *, bool> Entry = HashTable.insert(StringForElement);		std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
EXPECT_FALSE(Entry.second);		EXPECT_FALSE(Entry.second);
EXPECT_TRUE(Entry.first->getKey() == StringForElement);		EXPECT_TRUE(Entry.first->getKey() == StringForElement);
// Check no additional bytes were allocated for duplicate.		// Check no additional bytes were allocated for duplicate.
EXPECT_TRUE(Allocator.getBytesAllocated() == AllocatedBytesAtStart);		EXPECT_TRUE(ThreadLocalAllocator.getBytesAllocated() ==
		AllocatedBytesAtStart);
}		}

// Check statistic.		// Check statistic.
// Verifying that the table contains exactly the number of elements we		// Verifying that the table contains exactly the number of elements we
// inserted.		// inserted.
EXPECT_TRUE(StatisticString.find("Overall number of entries = 20000\n") !=		EXPECT_TRUE(StatisticString.find("Overall number of entries = 20000\n") !=
std::string::npos);		std::string::npos);
}		}

TEST(ConcurrentHashTableTest, AddStringEntriesParallel) {		TEST(ConcurrentHashTableTest, AddStringEntriesParallel) {
		PerThreadBumpPtrAllocator Allocator;
const size_t NumElements = 10000;		const size_t NumElements = 10000;
ConcurrentHashTableByPtr<		ConcurrentHashTableByPtr<std::string, String, PerThreadBumpPtrAllocator,
std::string, String, SimpleAllocator,		ConcurrentHashTableInfoByPtr<
ConcurrentHashTableInfoByPtr<std::string, String, SimpleAllocator>>		std::string, String, PerThreadBumpPtrAllocator>>
HashTable(Allocator);		HashTable(Allocator);

// Check parallel insertion.		// Check parallel insertion.
parallelFor(0, NumElements, [&](size_t I) {		parallelFor(0, NumElements, [&](size_t I) {
size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();		BumpPtrAllocator &ThreadLocalAllocator =
		Allocator.getThreadLocalAllocator();
		size_t AllocatedBytesAtStart = ThreadLocalAllocator.getBytesAllocated();
std::string StringForElement = formatv("{0}", I);		std::string StringForElement = formatv("{0}", I);
std::pair<String *, bool> Entry = HashTable.insert(StringForElement);		std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
EXPECT_TRUE(Entry.second);		EXPECT_TRUE(Entry.second);
EXPECT_TRUE(Entry.first->getKey() == StringForElement);		EXPECT_TRUE(Entry.first->getKey() == StringForElement);
EXPECT_TRUE(Allocator.getBytesAllocated() > AllocatedBytesAtStart);		EXPECT_TRUE(ThreadLocalAllocator.getBytesAllocated() >
		AllocatedBytesAtStart);
});		});

std::string StatisticString;		std::string StatisticString;
raw_string_ostream StatisticStream(StatisticString);		raw_string_ostream StatisticStream(StatisticString);
HashTable.printStatistic(StatisticStream);		HashTable.printStatistic(StatisticStream);

// Verifying that the table contains exactly the number of elements we		// Verifying that the table contains exactly the number of elements we
// inserted.		// inserted.
EXPECT_TRUE(StatisticString.find("Overall number of entries = 10000\n") !=		EXPECT_TRUE(StatisticString.find("Overall number of entries = 10000\n") !=
std::string::npos);		std::string::npos);

// Check parallel insertion of duplicates.		// Check parallel insertion of duplicates.
parallelFor(0, NumElements, [&](size_t I) {		parallelFor(0, NumElements, [&](size_t I) {
size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();		BumpPtrAllocator &ThreadLocalAllocator =
		Allocator.getThreadLocalAllocator();
		size_t AllocatedBytesAtStart = ThreadLocalAllocator.getBytesAllocated();
std::string StringForElement = formatv("{0}", I);		std::string StringForElement = formatv("{0}", I);
std::pair<String *, bool> Entry = HashTable.insert(StringForElement);		std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
EXPECT_FALSE(Entry.second);		EXPECT_FALSE(Entry.second);
EXPECT_TRUE(Entry.first->getKey() == StringForElement);		EXPECT_TRUE(Entry.first->getKey() == StringForElement);
// Check no additional bytes were allocated for duplicate.		// Check no additional bytes were allocated for duplicate.
EXPECT_TRUE(Allocator.getBytesAllocated() == AllocatedBytesAtStart);		EXPECT_TRUE(ThreadLocalAllocator.getBytesAllocated() ==
		AllocatedBytesAtStart);
});		});

// Check statistic.		// Check statistic.
// Verifying that the table contains exactly the number of elements we		// Verifying that the table contains exactly the number of elements we
// inserted.		// inserted.
EXPECT_TRUE(StatisticString.find("Overall number of entries = 10000\n") !=		EXPECT_TRUE(StatisticString.find("Overall number of entries = 10000\n") !=
std::string::npos);		std::string::npos);
}		}

TEST(ConcurrentHashTableTest, AddStringEntriesParallelWithResize) {		TEST(ConcurrentHashTableTest, AddStringEntriesParallelWithResize) {
		PerThreadBumpPtrAllocator Allocator;
const size_t NumElements = 20000;		const size_t NumElements = 20000;
ConcurrentHashTableByPtr<		ConcurrentHashTableByPtr<std::string, String, PerThreadBumpPtrAllocator,
std::string, String, SimpleAllocator,		ConcurrentHashTableInfoByPtr<
ConcurrentHashTableInfoByPtr<std::string, String, SimpleAllocator>>		std::string, String, PerThreadBumpPtrAllocator>>
HashTable(Allocator, 100);		HashTable(Allocator, 100);

// Check parallel insertion.		// Check parallel insertion.
parallelFor(0, NumElements, [&](size_t I) {		parallelFor(0, NumElements, [&](size_t I) {
size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();		BumpPtrAllocator &ThreadLocalAllocator =
		Allocator.getThreadLocalAllocator();
		size_t AllocatedBytesAtStart = ThreadLocalAllocator.getBytesAllocated();
std::string StringForElement = formatv("{0}", I);		std::string StringForElement = formatv("{0}", I);
std::pair<String *, bool> Entry = HashTable.insert(StringForElement);		std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
EXPECT_TRUE(Entry.second);		EXPECT_TRUE(Entry.second);
EXPECT_TRUE(Entry.first->getKey() == StringForElement);		EXPECT_TRUE(Entry.first->getKey() == StringForElement);
EXPECT_TRUE(Allocator.getBytesAllocated() > AllocatedBytesAtStart);		EXPECT_TRUE(ThreadLocalAllocator.getBytesAllocated() >
		AllocatedBytesAtStart);
});		});

std::string StatisticString;		std::string StatisticString;
raw_string_ostream StatisticStream(StatisticString);		raw_string_ostream StatisticStream(StatisticString);
HashTable.printStatistic(StatisticStream);		HashTable.printStatistic(StatisticStream);

// Verifying that the table contains exactly the number of elements we		// Verifying that the table contains exactly the number of elements we
// inserted.		// inserted.
EXPECT_TRUE(StatisticString.find("Overall number of entries = 20000\n") !=		EXPECT_TRUE(StatisticString.find("Overall number of entries = 20000\n") !=
std::string::npos);		std::string::npos);

// Check parallel insertion of duplicates.		// Check parallel insertion of duplicates.
parallelFor(0, NumElements, [&](size_t I) {		parallelFor(0, NumElements, [&](size_t I) {
size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();		BumpPtrAllocator &ThreadLocalAllocator =
		Allocator.getThreadLocalAllocator();
		size_t AllocatedBytesAtStart = ThreadLocalAllocator.getBytesAllocated();
std::string StringForElement = formatv("{0}", I);		std::string StringForElement = formatv("{0}", I);
std::pair<String *, bool> Entry = HashTable.insert(StringForElement);		std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
EXPECT_FALSE(Entry.second);		EXPECT_FALSE(Entry.second);
EXPECT_TRUE(Entry.first->getKey() == StringForElement);		EXPECT_TRUE(Entry.first->getKey() == StringForElement);
// Check no additional bytes were allocated for duplicate.		// Check no additional bytes were allocated for duplicate.
EXPECT_TRUE(Allocator.getBytesAllocated() == AllocatedBytesAtStart);		EXPECT_TRUE(ThreadLocalAllocator.getBytesAllocated() ==
		AllocatedBytesAtStart);
});		});

// Check statistic.		// Check statistic.
// Verifying that the table contains exactly the number of elements we		// Verifying that the table contains exactly the number of elements we
// inserted.		// inserted.
EXPECT_TRUE(StatisticString.find("Overall number of entries = 20000\n") !=		EXPECT_TRUE(StatisticString.find("Overall number of entries = 20000\n") !=
std::string::npos);		std::string::npos);
}		}

} // namespace		} // namespace

llvm/unittests/Support/CMakeLists.txt

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	add_llvm_unittest(SupportTests
MathExtrasTest.cpp		MathExtrasTest.cpp
MemoryBufferRefTest.cpp		MemoryBufferRefTest.cpp
MemoryBufferTest.cpp		MemoryBufferTest.cpp
MemoryTest.cpp		MemoryTest.cpp
NativeFormatTests.cpp		NativeFormatTests.cpp
OptimizedStructLayoutTest.cpp		OptimizedStructLayoutTest.cpp
ParallelTest.cpp		ParallelTest.cpp
Path.cpp		Path.cpp
		PerThreadBumpPtrAllocatorTest.cpp
ProcessTest.cpp		ProcessTest.cpp
ProgramTest.cpp		ProgramTest.cpp
RegexTest.cpp		RegexTest.cpp
ReverseIterationTest.cpp		ReverseIterationTest.cpp
ReplaceFileTest.cpp		ReplaceFileTest.cpp
RISCVAttributeParserTest.cpp		RISCVAttributeParserTest.cpp
RISCVISAInfoTest.cpp		RISCVISAInfoTest.cpp
ScaledNumberTest.cpp		ScaledNumberTest.cpp
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/unittests/Support/PerThreadBumpPtrAllocatorTest.cpp

This file was added.

				//===- PerThreadBumpPtrAllocatorTest.cpp ----------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Support/PerThreadBumpPtrAllocator.h"
				#include "llvm/Support/Parallel.h"
				#include "gtest/gtest.h"
				#include <cstdlib>

				using namespace llvm;
				using namespace parallel;

				namespace {

				TEST(PerThreadBumpPtrAllocatorTest, Simple) {
				PerThreadBumpPtrAllocator Allocator;

				uint64_t *Var =
				MaskRayUnsubmitted Not Done Reply Inline Actions delete blank line MaskRay: delete blank line
				(uint64_t *)Allocator.Allocate(sizeof(uint64_t), alignof(uint64_t));
				*Var = 0xFE;
				EXPECT_EQ(0xFEul, *Var);
				EXPECT_EQ(sizeof(uint64_t), Allocator.getBytesAllocated());
				EXPECT_TRUE(Allocator.getBytesAllocated() <= Allocator.getTotalMemory());

				PerThreadBumpPtrAllocator Allocator2(std::move(Allocator));

				EXPECT_EQ(sizeof(uint64_t), Allocator2.getBytesAllocated());
				EXPECT_TRUE(Allocator2.getBytesAllocated() <= Allocator2.getTotalMemory());

				EXPECT_EQ(0xFEul, *Var);
				}

				TEST(PerThreadBumpPtrAllocatorTest, ParallelAllocation) {
				PerThreadBumpPtrAllocator Allocator;

				static size_t constexpr NumAllocations = 1000;

				parallelFor(0, NumAllocations, [&](size_t Idx) {
				MaskRayUnsubmitted Not Done Reply Inline Actions 5000 seems too much. 1000 suffices. MaskRay: 5000 seems too much. 1000 suffices.
				uint64_t *ptr =
				(uint64_t *)Allocator.Allocate(sizeof(uint64_t), alignof(uint64_t));
				*ptr = Idx;
				});

				EXPECT_EQ(sizeof(uint64_t) * NumAllocations, Allocator.getBytesAllocated());
				EXPECT_EQ(Allocator.getNumberOfAllocators(), parallel::getNumThreads());
				}

				} // anonymous namespace

This is an archive of the discontinued LLVM Phabricator instance.

[Support] Add PerThreadBumpPtrAllocator class.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 513631

llvm/include/llvm/DWARFLinkerParallel/StringPool.h

llvm/include/llvm/Support/Parallel.h

llvm/include/llvm/Support/PerThreadBumpPtrAllocator.h

llvm/lib/Support/Parallel.cpp

llvm/unittests/ADT/ConcurrentHashtableTest.cpp

llvm/unittests/Support/CMakeLists.txt

llvm/unittests/Support/PerThreadBumpPtrAllocatorTest.cpp

[Support] Add PerThreadBumpPtrAllocator class.
ClosedPublic