On my 96-core cloudtop 'machine', it seems unnecessary to always start
96 threads upfront... particularly as the ThreadPool is created even
with -mlir-disable-threading. Things like the resuling spew in GDB and
the obfuscated output of (gdb) info threads are my motivation here,
but it probably also doesn't hurt for at least some efficiency metrics to
avoid creating many threads upfront.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Support/ThreadPool.cpp | ||
---|---|---|
24 | Here actually I'm really not sure: what should be the max number of threads in Threads: should it be Strategy.compute_thread_count() or should it be that minus one to account for the main thread? it looks like the existing code was doing the former but (as actually confirmed by GDB) that meant it had one more thread running than the detected hardware concurrency. Which could be fine if we expect that the main thread would be idle most of the time? What was the intent here? |
llvm/include/llvm/Support/ThreadPool.h | ||
---|---|---|
69–73 | Please decide what we want here? The immediate problem that led me to add +1 here was that when Threads was empty, I was now returning 0, and the caller was clearly expecting at least 1: At mlir/include/mlir/IR/Threading.h:75 size_t numActions = std::min(numElements, threadPool.getThreadCount()); SmallVector<std::shared_future<void>> threadFutures; threadFutures.reserve(numActions - 1); for (unsigned i = 1; i < numActions; ++i) threadFutures.emplace_back(threadPool.async(processFn)); |
Got it, getThreadCount() was really meant to return the max number of threads, not the current number. Those two numbers used to be the same before this patch.
llvm/include/llvm/Support/ThreadPool.h | ||
---|---|---|
69–73 | Self-replying: this call site really shows that getThreadCount was expected to return the potential max number of threads, which incidentally was equal to the current number of threads before this diff, but that's what is changing here. To avoid breaking existing users, I reverted getThreadCount to this behavior (I believe that the earlier state of this diff would have effectively kept the number of threads to 1. Now I've checked in GDB that we do create a dozen threads for a simple lit test). | |
llvm/lib/Support/ThreadPool.cpp | ||
24 | Got it, nevermind, updated this diff (see reply to the other comment thread). |
particularly as the ThreadPool is created even with -mlir-disable-threading.
I'm fairly sure I fixed this one or two months ago, at least the C++ API allows to setup a project without starting a ThreadPool at all. This was a bottleneck in TensorFlow for some small MLIR work we're doing there.
There might be some similar plumbing we could do in how we handle --mlir-disable-threading with mlir-opt as well, since it is a testing tool we haven't tried to "optimize" this kind of things (you're making a good case for it though!).
That said, I like this change to lazily create thread as we go: so that even in cases where I want parallelism, I won't create more threads than actually needed!
llvm/lib/Support/ThreadPool.cpp | ||
---|---|---|
27 | Nit: remove trivial braces |
llvm/include/llvm/Support/ThreadPool.h | ||
---|---|---|
71 | I'm not fond of keeping the API name as-is with a new semantics. What about removing it and using two APIs instead:
|
Indeed, it is in mlir-opt that I was observing that behavior this week.
llvm/include/llvm/Support/ThreadPool.h | ||
---|---|---|
71 | There are 3 call sites in one Clang file, and 2 call sites in 2 MLIR files. That's a little more than I feel save updating all in one shot. Are you OK to keep the current name for now, and defer to a follow-up? At least the behavior isn't changing, these existing users are still going to get effectively the same result value. For the new methods: I agree that the current getThreadCount should get renamed getMaxThreadCount. I don't feel comfortable with getAvailableThreadCount() => return Threads.size() because that would conflict with the meaning of "available" in the existing member AvailableThreads. How about getCurrentThreadCount ? |
llvm/include/llvm/Support/ThreadPool.h | ||
---|---|---|
150 | Nit: it won't have any impact right now, but can you guard this declaration with LLVM_ENABLE_THREADS? That would lead to compile time failures instead of link-time failures if someone changes something incorrectly in the future. |
Thanks for the quick review! I don't have permissions to push by myself, FWIW.
I'm also running this through google global presubmits tonight... let's wait until tomorrow to submit this.
Looks like this is breaking the buildbot:
https://lab.llvm.org/buildbot/#/builders/37/builds/8903
salient log extract:
[127/558] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/ThreadPool.cpp.o FAILED: lib/Support/CMakeFiles/LLVMSupport.dir/ThreadPool.cpp.o /b/sanitizer-x86_64-linux/build/llvm_build64/bin/clang++ -DGTEST_HAS_RTTI=0 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Support -I/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Support -Iinclude -I/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -Werror=global-constructors -m32 -fPIC -flto -Os -g0 -DNDEBUG -fno-rtti -fno-exceptions -nostdinc++ -I/b/sanitizer-x86_64-linux/build/symbolizer_build32/symbolizer/zlib -isystem /b/sanitizer-x86_64-linux/build/symbolizer_build32/symbolizer/libcxx/include/x86_64-unknown-linux-gnu/c++/v1 -isystem /b/sanitizer-x86_64-linux/build/symbolizer_build32/symbolizer/libcxx/include/c++/v1 -Wno-error=global-constructors -std=c++14 -fno-exceptions -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/ThreadPool.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/ThreadPool.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/ThreadPool.cpp.o -c /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Support/ThreadPool.cpp /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Support/ThreadPool.cpp:97:13: error: redefinition of 'ThreadPool' ThreadPool::ThreadPool(ThreadPoolStrategy S) ^ /b/sanitizer-x86_64-linux/build/llvm-project/llvm/include/llvm/Support/ThreadPool.h:43:3: note: previous definition is here ThreadPool(ThreadPoolStrategy S = hardware_concurrency()) ^ /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Support/ThreadPool.cpp:98:7: error: initializer 'ThreadCount' does not name a non-static data member or base class; did you mean the member 'MaxThreadCount'? : ThreadCount(S.compute_thread_count()) { ^~~~~~~~~~~ MaxThreadCount /b/sanitizer-x86_64-linux/build/llvm-project/llvm/include/llvm/Support/ThreadPool.h:177:18: note: 'MaxThreadCount' declared here const unsigned MaxThreadCount; ^ /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Support/ThreadPool.cpp:99:7: error: use of undeclared identifier 'ThreadCount' if (ThreadCount != 1) { ^ /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Support/ThreadPool.cpp:100:56: error: use of undeclared identifier 'ThreadCount'; did you mean 'MaxThreadCount'? errs() << "Warning: request a ThreadPool with " << ThreadCount ^~~~~~~~~~~ MaxThreadCount /b/sanitizer-x86_64-linux/build/llvm-project/llvm/include/llvm/Support/ThreadPool.h:177:18: note: 'MaxThreadCount' declared here const unsigned MaxThreadCount; ^ 4 errors generated.
https://lab.llvm.org/buildbot/ is broken for me right now though (502 Bad Gateway), does it work for you?
Sorry, all, for the breakage, and thanks for handling it while I was away!!
It looks like I never paid attention to the not-LLVM_ENABLE_THREADS path.
I'm not fond of keeping the API name as-is with a new semantics.
What about removing it and using two APIs instead: