This is an archive of the discontinued LLVM Phabricator instance.

ThreadPool: grow the pool only as needed
ClosedPublic

Authored by Benoit on Dec 2 2021, 6:47 PM.

Details

Summary

On my 96-core cloudtop 'machine', it seems unnecessary to always start
96 threads upfront... particularly as the ThreadPool is created even
with -mlir-disable-threading. Things like the resuling spew in GDB and
the obfuscated output of (gdb) info threads are my motivation here,
but it probably also doesn't hurt for at least some efficiency metrics to
avoid creating many threads upfront.

Diff Detail

Event Timeline

Benoit created this revision.Dec 2 2021, 6:47 PM
Benoit requested review of this revision.Dec 2 2021, 6:47 PM
Benoit updated this revision to Diff 391527.Dec 2 2021, 6:54 PM

off by one mistake in max number of worker threads

Benoit added inline comments.Dec 2 2021, 7:05 PM
llvm/lib/Support/ThreadPool.cpp
24

Here actually I'm really not sure: what should be the max number of threads in Threads: should it be Strategy.compute_thread_count() or should it be that minus one to account for the main thread? it looks like the existing code was doing the former but (as actually confirmed by GDB) that meant it had one more thread running than the detected hardware concurrency. Which could be fine if we expect that the main thread would be idle most of the time? What was the intent here?

Benoit added inline comments.Dec 2 2021, 7:11 PM
llvm/include/llvm/Support/ThreadPool.h
68–70

Please decide what we want here? The immediate problem that led me to add +1 here was that when Threads was empty, I was now returning 0, and the caller was clearly expecting at least 1:

At mlir/include/mlir/IR/Threading.h:75

size_t numActions = std::min(numElements, threadPool.getThreadCount());
SmallVector<std::shared_future<void>> threadFutures;
threadFutures.reserve(numActions - 1);
for (unsigned i = 1; i < numActions; ++i)
  threadFutures.emplace_back(threadPool.async(processFn));
Benoit updated this revision to Diff 391531.Dec 2 2021, 7:39 PM

Got it, getThreadCount() was really meant to return the max number of threads, not the current number. Those two numbers used to be the same before this patch.

Benoit added inline comments.Dec 2 2021, 7:42 PM
llvm/include/llvm/Support/ThreadPool.h
68–70

Self-replying: this call site really shows that getThreadCount was expected to return the potential max number of threads, which incidentally was equal to the current number of threads before this diff, but that's what is changing here. To avoid breaking existing users, I reverted getThreadCount to this behavior (I believe that the earlier state of this diff would have effectively kept the number of threads to 1. Now I've checked in GDB that we do create a dozen threads for a simple lit test).

llvm/lib/Support/ThreadPool.cpp
24

Got it, nevermind, updated this diff (see reply to the other comment thread).

particularly as the ThreadPool is created even with -mlir-disable-threading.

I'm fairly sure I fixed this one or two months ago, at least the C++ API allows to setup a project without starting a ThreadPool at all. This was a bottleneck in TensorFlow for some small MLIR work we're doing there.

There might be some similar plumbing we could do in how we handle --mlir-disable-threading with mlir-opt as well, since it is a testing tool we haven't tried to "optimize" this kind of things (you're making a good case for it though!).

That said, I like this change to lazily create thread as we go: so that even in cases where I want parallelism, I won't create more threads than actually needed!

llvm/lib/Support/ThreadPool.cpp
27

Nit: remove trivial braces

mehdi_amini added inline comments.Dec 2 2021, 8:02 PM
llvm/include/llvm/Support/ThreadPool.h
70

I'm not fond of keeping the API name as-is with a new semantics.

What about removing it and using two APIs instead:

  • getMaxThreadCount()
  • getAvailableThreadCount() => return Threads.size()
Benoit updated this revision to Diff 391535.Dec 2 2021, 8:19 PM

Address review comments.

Benoit added a comment.Dec 2 2021, 8:24 PM

I'm fairly sure I fixed this one or two months ago, at least the C++ API allows to setup a project without starting a ThreadPool at all. This was a bottleneck in TensorFlow for some small MLIR work we're doing there.

There might be some similar plumbing we could do in how we handle --mlir-disable-threading with mlir-opt as well, since it is a testing tool we haven't tried to "optimize" this kind of things (you're making a good case for it though!).

Indeed, it is in mlir-opt that I was observing that behavior this week.

llvm/include/llvm/Support/ThreadPool.h
70

There are 3 call sites in one Clang file, and 2 call sites in 2 MLIR files. That's a little more than I feel save updating all in one shot. Are you OK to keep the current name for now, and defer to a follow-up? At least the behavior isn't changing, these existing users are still going to get effectively the same result value.

For the new methods: I agree that the current getThreadCount should get renamed getMaxThreadCount.

I don't feel comfortable with getAvailableThreadCount() => return Threads.size() because that would conflict with the meaning of "available" in the existing member AvailableThreads. How about getCurrentThreadCount ?

mehdi_amini added inline comments.Dec 2 2021, 8:24 PM
llvm/include/llvm/Support/ThreadPool.h
148

Nit: it won't have any impact right now, but can you guard this declaration with LLVM_ENABLE_THREADS?

That would lead to compile time failures instead of link-time failures if someone changes something incorrectly in the future.

mehdi_amini accepted this revision.Dec 2 2021, 8:26 PM

LG, thanks!

llvm/include/llvm/Support/ThreadPool.h
70

getCurrentThreadCount() LGTM

This revision is now accepted and ready to land.Dec 2 2021, 8:26 PM
Benoit updated this revision to Diff 391536.Dec 2 2021, 8:27 PM

review comment

Benoit marked an inline comment as done.Dec 2 2021, 8:27 PM

LG, thanks!

Thanks for the quick review! I don't have permissions to push by myself, FWIW.
I'm also running this through google global presubmits tonight... let's wait until tomorrow to submit this.

Sure, ping me tomorrow!

Benoit added a comment.EditedDec 3 2021, 12:34 PM

Wait a bit more before submitting... we're still debugging CI failures.

Benoit updated this revision to Diff 391723.Dec 3 2021, 1:29 PM

fix preexisting race condition in isWorkerThread found by MLIR tests with TSan

Benoit added a comment.Dec 3 2021, 1:30 PM

This should be good to submit now.

This revision was landed with ongoing or failed builds.Dec 3 2021, 1:48 PM
This revision was automatically updated to reflect the committed changes.
kda added a subscriber: kda.Dec 3 2021, 6:03 PM

Looks like this is breaking the buildbot:
https://lab.llvm.org/buildbot/#/builders/37/builds/8903

kda added a comment.Dec 3 2021, 6:05 PM

salient log extract:

[127/558] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/ThreadPool.cpp.o
FAILED: lib/Support/CMakeFiles/LLVMSupport.dir/ThreadPool.cpp.o 
/b/sanitizer-x86_64-linux/build/llvm_build64/bin/clang++  -DGTEST_HAS_RTTI=0 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Support -I/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Support -Iinclude -I/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -Werror=global-constructors -m32 -fPIC -flto -Os -g0 -DNDEBUG -fno-rtti -fno-exceptions -nostdinc++ -I/b/sanitizer-x86_64-linux/build/symbolizer_build32/symbolizer/zlib -isystem /b/sanitizer-x86_64-linux/build/symbolizer_build32/symbolizer/libcxx/include/x86_64-unknown-linux-gnu/c++/v1 -isystem /b/sanitizer-x86_64-linux/build/symbolizer_build32/symbolizer/libcxx/include/c++/v1 -Wno-error=global-constructors   -std=c++14  -fno-exceptions -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/ThreadPool.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/ThreadPool.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/ThreadPool.cpp.o -c /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Support/ThreadPool.cpp
/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Support/ThreadPool.cpp:97:13: error: redefinition of 'ThreadPool'
ThreadPool::ThreadPool(ThreadPoolStrategy S)
            ^
/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include/llvm/Support/ThreadPool.h:43:3: note: previous definition is here
  ThreadPool(ThreadPoolStrategy S = hardware_concurrency())
  ^
/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Support/ThreadPool.cpp:98:7: error: initializer 'ThreadCount' does not name a non-static data member or base class; did you mean the member 'MaxThreadCount'?
    : ThreadCount(S.compute_thread_count()) {
      ^~~~~~~~~~~
      MaxThreadCount
/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include/llvm/Support/ThreadPool.h:177:18: note: 'MaxThreadCount' declared here
  const unsigned MaxThreadCount;
                 ^
/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Support/ThreadPool.cpp:99:7: error: use of undeclared identifier 'ThreadCount'
  if (ThreadCount != 1) {
      ^
/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Support/ThreadPool.cpp:100:56: error: use of undeclared identifier 'ThreadCount'; did you mean 'MaxThreadCount'?
    errs() << "Warning: request a ThreadPool with " << ThreadCount
                                                       ^~~~~~~~~~~
                                                       MaxThreadCount
/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include/llvm/Support/ThreadPool.h:177:18: note: 'MaxThreadCount' declared here
  const unsigned MaxThreadCount;
                 ^
4 errors generated.

Sorry! Pushed a fix in b28f317c8156 and will monitor the bot tonight.

https://lab.llvm.org/buildbot/ is broken for me right now though (502 Bad Gateway), does it work for you?

Benoit added a comment.Dec 3 2021, 7:16 PM

Sorry! Pushed a fix in b28f317c8156 and will monitor the bot tonight.

Sorry, all, for the breakage, and thanks for handling it while I was away!!
It looks like I never paid attention to the not-LLVM_ENABLE_THREADS path.