This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/lib/IR/
-
lib/
-
IR/
9/9
Verifier.cpp

Differential D104207

[Verifier] Parallelize verification and dom checking. NFC.
ClosedPublic

Authored by lattner on Jun 13 2021, 9:42 PM.

Download Raw Diff

Details

Reviewers

rriddle

Commits

rGce77039596a9: [Verifier] Parallelize verification and dom checking. NFC.

Summary

This changes the outer verification loop to not recurse into
IsolatedFromAbove operations - instead return them up to a place
where a parallel for loop can process them all in parallel. This
also changes Dominance checking to happen on IsolatedFromAbove
chunks of the region tree, which makes it easy to fold operation
and dominance verification into a single simple parallel regime.

This speeds up firtool in CIRCT from ~40s to 31s on a large
testcase in -verify-each mode (the default). The .fir parser and
module passes in particular benefit from this - FModule passes
(roughly analogous to function passes) were already running the
verifier in parallel as part of the pass manager. This allows
the whole-module passes to verify their enclosed functions /
FModules in parallel.

-verify-each mode is still faster (26.3s on the same testcase),
but we do expect the verifier to take *some* time.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lattner created this revision.Jun 13 2021, 9:42 PM

Herald added a reviewer: rriddle. · View Herald TranscriptJun 13 2021, 9:42 PM

Herald added subscribers: dcaballe, cota, teijeong and 17 others. · View Herald Transcript

lattner requested review of this revision.Jun 13 2021, 9:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2021, 9:42 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

I filed https://bugs.llvm.org/show_bug.cgi?id=50701 to track cases where the PassManager is verifying things it shouldn't IMO. Fixing that will be an orthogonal improvement to this patch.

Harbormaster completed remote builds in B109044: Diff 351768.Jun 13 2021, 10:23 PM

mehdi_amini mentioned this in D103373: [Verifier] Speed up and parallelize dominance checking. NFC.Jun 13 2021, 10:32 PM

LGTM, thanks!

mlir/lib/IR/Verifier.cpp
154	I don't think the getOperations here is necessary.
191	nit: Can you move this doc to the declaration instead?
239	nit: Spell out auto here.
241	Is the mlir:: here necessary?
242	op.getLoc()? I'm assuming this is what region.getLoc does internally.
350–352	Can we change this loop to use `llvm::enumerate(op.getOperands())` ?

This revision is now accepted and ready to land.Jun 14 2021, 12:51 AM

Thank you for the quick review!

mlir/lib/IR/Verifier.cpp
191	Sure.
241	Yes, but not for a good reason. Verifier defined its own emitError for a weird reason. fixed.
350–352	yes, nicer!

Incorporate River's feedback.

bondhugula added a subscriber: bondhugula.Jun 14 2021, 10:00 AM

bondhugula added inline comments.

mlir/lib/Pass/Pass.cpp
390 ↗	(On Diff #351897)	This comment can be updated.

lattner added inline comments.Jun 14 2021, 10:02 AM

mlir/lib/Pass/Pass.cpp
390 ↗	(On Diff #351897)	Thank you for catching this. This wasn't intended to be in this patch. I'll remove it and put it into a separate patch!

Remove the changes to Pass.cpp, they were supposed to be in a follow-on

This revision was landed with ongoing or failed builds.Jun 14 2021, 10:03 AM

Closed by commit rGce77039596a9: [Verifier] Parallelize verification and dom checking. NFC. (authored by lattner). · Explain Why

This revision was automatically updated to reflect the committed changes.

lattner added a commit: rGce77039596a9: [Verifier] Parallelize verification and dom checking. NFC..

Harbormaster completed remote builds in B109129: Diff 351907.Jun 14 2021, 10:44 AM

FYI we are once again seeing hangs triggered by this change. It appears to have something to do with LLVM thread pool cleanup. We've hit it in IREE's OSS python tests, google's internal runs of MLIR's cuda integration tests, and IREE's iree-run-mlir tests (https://source.cloud.google.com/results/invocations/7ed7a557-2db6-4f7a-84e9-9e2dec6b1405/). So far we haven't been able to get a reproduction upstream, so it seems to have using MLIR in some certain environment

This is just using llvm::parallelForEachN (not doing anything particularly fancy) so I can't imagine how it would be different than other similar things using it. It is possible this is exposing a lower level problem in LLVM threading.

In any case, let me know how I can help. I'd prefer not to revert this though, as it is a significant speedup.

In D104207#2822970, @lattner wrote:

This is just using llvm::parallelForEachN (not doing anything particularly fancy) so I can't imagine how it would be different than other similar things using it. It is possible this is exposing a lower level problem in LLVM threading.

In any case, let me know how I can help. I'd prefer not to revert this though, as it is a significant speedup.

I believe it is exposing such a lower-level problem, yeah. At least that's our guess though it's a pretty prickly problem

In D104207#2822970, @lattner wrote:

This is just using llvm::parallelForEachN (not doing anything particularly fancy) so I can't imagine how it would be different than other similar things using it. It is possible this is exposing a lower level problem in LLVM threading.

In any case, let me know how I can help. I'd prefer not to revert this though, as it is a significant speedup.

I am partially speculating (but I've also partially triaged this before on various platforms and don't think I'm wholly wrong). I suspect we're running up against something that will cause another paragraph to be written for the Executor *Executor::getDefaultExecutor() method in Parallel.cpp. With this patch, we are now getting intermittent hangs on exit/global destruct depending on which shared libraries have been loaded in the host process on Linux. I don't think there is anything wrong with this patch, per se, but it is quite consistently triggering some really nasty behavior across a variety of our projects and Linux platforms.

In general, I've found these kinds of global destructor thread shutdowns to be finicky at best, with odd interactions with respect to destruction order, platform, libraries loaded, etc. I don't have further information at this time, but at least speaking up for posterity seems appropriate.

I've got some pretty strong evidence that this is indeed deadlocking during verification processing, but I can't quite explain why.

I was able to capture this backtrace of all threads from a statically compiled iree-translate on Linux doing a very straight-forward sample program compile. We've got evidence of similar deadlocks in a number of other cases across platforms and compilers but have not had the right builds with debug info locally to verify the same deadlock (though it stands to reason):

Thread 3 (Thread 0x7f29fb119700 (LWP 4046230)):
#0  0x00007f29fbb369f4 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/grte/v4/lib64/libpthread.so.0
#1  0x000055e48a8a13b2 in std::__u::__libcpp_condvar_wait (__cv=0x7f29fb118794, __m=0x80) at third_party/llvm/llvm-project/libcxx/include/__threading_support:440
#2  std::__u::condition_variable::wait (this=0x7f29fb118794, lk=...) at third_party/llvm/llvm-project/libcxx/src/condition_variable.cpp:44
#3  0x000055e48a7c260b in llvm::parallel::detail::TaskGroup::~TaskGroup() () at third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__mutex_base:406
#4  0x000055e48a70dc16 in (anonymous namespace)::OperationVerifier::verifyOpAndDominance(mlir::Operation&) () at third_party/llvm/llvm-project/llvm/include/llvm/Support/Parallel.h:186
#5  0x000055e48a70faff in void std::__u::__function::__policy_invoker<void ()>::__call_impl<std::__u::__function::__default_alloc_func<llvm::parallel::detail::parallel_for_each_n<unsigned long, (anonymous namespace)::OperationVerifier::verifyOpAndDominance(mlir::Operation&)::$_0>(unsigned long, unsigned long, (anonymous namespace)::OperationVerifier::verifyOpAndDominance(mlir::Operation&)::$_0)::{lambda()#1}, void ()> >(std::__u::__function::__policy_storage const*) () at /proc/self/cwd/third_party/llvm/llvm-project/mlir/lib/IR/Verifier.cpp:105
#6  0x000055e48a7c49f1 in void std::__u::__function::__policy_invoker<void ()>::__call_impl<std::__u::__function::__default_alloc_func<llvm::parallel::detail::TaskGroup::spawn(std::__u::function<void ()>)::$_0, void ()> >(std::__u::__function::__policy_storage const*) () at third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/functional:2230
#7  0x000055e48a7c37c4 in llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::work(llvm::ThreadPoolStrategy, unsigned int) () at third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/functional:2230
#8  0x000055e48a7c3850 in void* std::__u::__thread_proxy<std::__u::tuple<std::__u::unique_ptr<std::__u::__thread_struct, std::__u::default_delete<std::__u::__thread_struct> >, llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::{lambda()#1}::operator()() const::{lambda()#1}> >(llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::{lambda()#1}::operator()() const::{lambda()#1}) () at /proc/self/cwd/third_party/llvm/llvm-project/llvm/lib/Support/Parallel.cpp:52
#9  0x00007f29fbb324e8 in start_thread () from /usr/grte/v4/lib64/libpthread.so.0
#10 0x00007f29fb9a722d in clone () from /usr/grte/v4/lib64/libc.so.6

Thread 2 (Thread 0x7f29fb91b700 (LWP 4046229)):
#0  0x00007f29fbb369f4 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/grte/v4/lib64/libpthread.so.0
#1  0x000055e48a8a13b2 in std::__u::__libcpp_condvar_wait (__cv=0x7f29fb91a744, __m=0x80) at third_party/llvm/llvm-project/libcxx/include/__threading_support:440
#2  std::__u::condition_variable::wait (this=0x7f29fb91a744, lk=...) at third_party/llvm/llvm-project/libcxx/src/condition_variable.cpp:44
#3  0x000055e48a7c260b in llvm::parallel::detail::TaskGroup::~TaskGroup() () at third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__mutex_base:406
#4  0x000055e48a70dc16 in (anonymous namespace)::OperationVerifier::verifyOpAndDominance(mlir::Operation&) () at third_party/llvm/llvm-project/llvm/include/llvm/Support/Parallel.h:186
#5  0x000055e48a70faff in void std::__u::__function::__policy_invoker<void ()>::__call_impl<std::__u::__function::__default_alloc_func<llvm::parallel::detail::parallel_for_each_n<unsigned long, (anonymous namespace)::OperationVerifier::verifyOpAndDominance(mlir::Operation&)::$_0>(unsigned long, unsigned long, (anonymous namespace)::OperationVerifier::verifyOpAndDominance(mlir::Operation&)::$_0)::{lambda()#1}, void ()> >(std::__u::__function::__policy_storage const*) () at /proc/self/cwd/third_party/llvm/llvm-project/mlir/lib/IR/Verifier.cpp:105
#6  0x000055e48a7c49f1 in void std::__u::__function::__policy_invoker<void ()>::__call_impl<std::__u::__function::__default_alloc_func<llvm::parallel::detail::TaskGroup::spawn(std::__u::function<void ()>)::$_0, void ()> >(std::__u::__function::__policy_storage const*) () at third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/functional:2230
#7  0x000055e48a7c37c4 in llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::work(llvm::ThreadPoolStrategy, unsigned int) () at third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/functional:2230
#8  0x000055e48a7c35ce in void* std::__u::__thread_proxy<std::__u::tuple<std::__u::unique_ptr<std::__u::__thread_struct, std::__u::default_delete<std::__u::__thread_struct> >, llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::{lambda()#1}> >(std::__u::tuple<std::__u::unique_ptr<std::__u::__thread_struct, std::__u::default_delete<std::__u::__thread_struct> >, llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::{lambda()#1}>) () at /proc/self/cwd/third_party/llvm/llvm-project/llvm/lib/Support/Parallel.cpp:57
#9  0x00007f29fbb324e8 in start_thread () from /usr/grte/v4/lib64/libpthread.so.0
#10 0x00007f29fb9a722d in clone () from /usr/grte/v4/lib64/libc.so.6

Thread 1 (Thread 0x7f29fb96dcc0 (LWP 4046189)):
#0  0x00007f29fbb369f4 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/grte/v4/lib64/libpthread.so.0
#1  0x000055e48a8a13b2 in std::__u::__libcpp_condvar_wait (__cv=0x7fffcdc78f74, __m=0x80) at third_party/llvm/llvm-project/libcxx/include/__threading_support:440
#2  std::__u::condition_variable::wait (this=0x7fffcdc78f74, lk=...) at third_party/llvm/llvm-project/libcxx/src/condition_variable.cpp:44
#3  0x000055e48a7c260b in llvm::parallel::detail::TaskGroup::~TaskGroup() () at third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__mutex_base:406
#4  0x000055e48a70dc16 in (anonymous namespace)::OperationVerifier::verifyOpAndDominance(mlir::Operation&) () at third_party/llvm/llvm-project/llvm/include/llvm/Support/Parallel.h:186
#5  0x000055e48a70d925 in mlir::verify(mlir::Operation*) () at /proc/self/cwd/third_party/llvm/llvm-project/mlir/lib/IR/Verifier.cpp:379
#6  0x000055e48a5d7cbe in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) () at /proc/self/cwd/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:409
#7  0x000055e48a5d80e6 in mlir::detail::OpToOpPassAdaptor::runPipeline(llvm::iterator_range<llvm::pointee_iterator<std::__u::unique_ptr<mlir::Pass, std::__u::default_delete<mlir::Pass> >*, mlir::Pass> >, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) () at /proc/self/cwd/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:444
#8  0x000055e48a5d9c54 in mlir::PassManager::run(mlir::Operation*) () at /proc/self/cwd/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:710
#9  0x000055e4892342e8 in mlir::iree_compiler::translateFromMLIRToVMBytecodeModuleWithFlags(mlir::ModuleOp, llvm::raw_ostream&) () at /proc/self/cwd/third_party/iree/iree/compiler/Translation/IREEVM.cpp:217
#10 0x000055e48a371c8c in mlir::LogicalResult std::__u::__function::__policy_invoker<mlir::LogicalResult (llvm::SourceMgr&, llvm::raw_ostream&, mlir::MLIRContext*)>::__call_impl<std::__u::__function::__default_alloc_func<mlir::TranslateFromMLIRRegistration::TranslateFromMLIRRegistration(llvm::StringRef, std::__u::function<mlir::LogicalResult (mlir::ModuleOp, llvm::raw_ostream&)> const&, std::__u::function<void (mlir::DialectRegistry&)>)::$_1, mlir::LogicalResult (llvm::SourceMgr&, llvm::raw_ostream&, mlir::MLIRContext*)> >(std::__u::__function::__policy_storage const*, llvm::SourceMgr&, llvm::raw_ostream&, mlir::MLIRContext*) () at third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/functional:2230
#11 0x000055e4872b5ea2 in main () at third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/functional:2230

I was not successful in getting a deadlock in debug mode, but I did get one with ASAN enabled, which revealed an additional clue in the stack trace:

#0  0x00007f9ad5a6a9f4 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/grte/v4/lib64/libpthread.so.0
#1  0x000056331a2e55da in std::__u::__libcpp_condvar_wait (__cv=0x7f9ad29ef454, __m=0x80) at third_party/llvm/llvm-project/libcxx/include/__threading_support:440
#2  std::__u::condition_variable::wait (this=0x7f9ad29ef454, lk=...) at third_party/llvm/llvm-project/libcxx/src/condition_variable.cpp:44
#3  0x0000563319efd4eb in llvm::parallel::detail::Latch::sync() const () at third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__mutex_base:406
#4  0x0000563319efce52 in llvm::parallel::detail::TaskGroup::~TaskGroup() () at third_party/llvm/llvm-project/llvm/include/llvm/Support/Parallel.h:43
#5  0x0000563319d4416d in (anonymous namespace)::OperationVerifier::verifyOpAndDominance(mlir::Operation&) () at third_party/llvm/llvm-project/llvm/include/llvm/Support/Parallel.h:186
#6  0x0000563319d43a2c in mlir::verify(mlir::Operation*) () at /proc/self/cwd/third_party/llvm/llvm-project/mlir/lib/IR/Verifier.cpp:379
#7  0x0000563319978e6f in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) () at /proc/self/cwd/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:409
#8  0x0000563319979dc1 in mlir::detail::OpToOpPassAdaptor::runPipeline(llvm::iterator_range<llvm::pointee_iterator<std::__u::unique_ptr<mlir::Pass, std::__u::default_delete<mlir::Pass> >*, mlir::Pass> >, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) () at /proc/self/cwd/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:444
#9  0x000056331997eddf in mlir::PassManager::run(mlir::Operation*) () at /proc/self/cwd/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:710
#10 0x000056331599da45 in mlir::iree_compiler::translateFromMLIRToVMBytecodeModuleWithFlags(mlir::ModuleOp, llvm::raw_ostream&) () at /proc/self/cwd/third_party/iree/iree/compiler/Translation/IREEVM.cpp:217
#11 0x00005633191f174b in mlir::LogicalResult std::__u::__function::__policy_invoker<mlir::LogicalResult (llvm::SourceMgr&, llvm::raw_ostream&, mlir::MLIRContext*)>::__call_impl<std::__u::__function::__default_alloc_func<mlir::TranslateFromMLIRRegistration::TranslateFromMLIRRegistration(llvm::StringRef, std::__u::function<mlir::LogicalResult (mlir::ModuleOp, llvm::raw_ostream&)> const&, std::__u::function<void (mlir::DialectRegistry&)>)::$_1, mlir::LogicalResult (llvm::SourceMgr&, llvm::raw_ostream&, mlir::MLIRContext*)> >(std::__u::__function::__policy_storage const*, llvm::SourceMgr&, llvm::raw_ostream&, mlir::MLIRContext*) () at third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/functional:2230
#12 0x000056330f467f9e in main::$_0::operator()(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer> >, llvm::raw_ostream&) const ()
    at third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/functional:2230
#13 0x000056330f46706d in main () at /proc/self/cwd/third_party/iree/iree/tools/iree-translate-main.cc:133

Basically the same except that this one captures the offending frame: everything is deadlocking on the Latch::sync() of ~TaskGroup().

I and a couple of others spent ~an hour piecing apart the Parallel.h/cpp code and couldn't spot the race, but I believe there is an issue here that the verifier parallelism is tickling. Probably latent and just triggering based on workload ergonomics? In any case, I think we should disable parallelism in this patch for now. It's a great speedup but not at the cost of stability.

rriddle mentioned this in D104516: [mlir] Add a ThreadPool to MLIRContext and refactor MLIR threading usage.Jun 18 2021, 3:30 AM

I've sent out https://reviews.llvm.org/D104516 as a potential future direction for the threading in MLIR. It adds a ThreadPool to MLIRContext that can be used to provide more consistent thread usage, and I believe it provides a few other advantages as well (such as avoiding any potential static destruction issues).

For a while we only had repros using Blaze or Bazel. This is a pretty thorny bug to pin down. I managed to get it to hang using an OSS build of IREE with CMake though. Here's the backtrace for all threads: https://gist.github.com/GMNGeoffrey/0bdf40644efcda68db0c0ace734273b3

stellaraccident mentioned this in D104570: Partial rollback: Disable MLIR verifier parallelism..Jun 18 2021, 2:44 PM

stellaraccident mentioned this in rG4b9d28bd530f: Partial rollback: Disable MLIR verifier parallelism..Jun 18 2021, 2:58 PM

Ok, well I'm still not clear what is going on here - if the static dtor for the thread pool is running while there is still MLIR stuff going on then there is going to be all sorts of bad things that come unraveled. However, I'm totally ok with River's patch to add threadpool to MLIRContext.

In D104207#2829428, @lattner wrote:

Ok, well I'm still not clear what is going on here - if the static dtor for the thread pool is running while there is still MLIR stuff going on then there is going to be all sorts of bad things that come unraveled. However, I'm totally ok with River's patch to add threadpool to MLIRContext.

Yeah, I don't know yet either, despite a lot of time staring at it. But there is definitely something going on across a couple of projects (including just core mlir tests) at a low rate, which shows up for us given the high rate of test runs. Given the frequency, it is hard to repro. But I'm also +1 on River's patch and would like to see that go in.

Have you tried shoving the global executor into a ManagedStatic?

In D104207#2830743, @lattner wrote:

Have you tried shoving the global executor into a ManagedStatic?

Not the executor itself. The ThreadPool is in a ManagedStatic. Most of the triage last week was assuming this to be a shutdown hang, and it was while reproducing that (i.e. doing explicit llvm_shutdown of ManagedStatics) that Geoffrey and I both discovered that the hang/deadlock is actually happening during verification, not at shutdown, and we finally got the backtraces from a process that wasn't doing anything "funny" with threads or global state (i.e. a stock *-translate executable). It took quite a bit of troubleshooting to get it that far because it only seems to happen in optimized builds, and most of our bots that test those at a high enough frequency were not building with enough symbols to debug much. (we have also had exit time issues with some systems, and I think this was incorrectly pattern matched to that early on)

I don't have a theory on the root cause right now, despite having looked at this quite a bit. It appears that River's patch does not seem to cause a hang. I don't think that it is related to pulling the thread pool to the context level (which is a good change on its own, imo) but something else being done differently. We may be leaving a race latent in the old path -- I just don't know at this point.

The backtrace Stella posted above https://reviews.llvm.org/D104207#2826290 is an example of hang that isn't during shutdown.

It is really puzzling because it looks like multiple threads managed to get their TaskGroup with parallelism enabled, which isn't supposed to be possible. Many of us stared at the code for a while but we haven't made progress! (it really does not help that this is hard to reproduce...)

The problem with pulling this into an MLIRContext is that parallelism isn't specific to MLIR. It is specific to the machine that is being run on. It's not like MLIR gets some cpus and (LLVM or higher-level SW) gets others.

-Chris

In D104207#2831076, @lattner wrote:

The problem with pulling this into an MLIRContext is that parallelism isn't specific to MLIR. It is specific to the machine that is being run on. It's not like MLIR gets some cpus and (LLVM or higher-level SW) gets others.

-Chris

Agreed - and this applies more-so to the current state of having it as an LLVM-level static. Ultimately, I'm fine with APIs like MLIRContext which default to taking control of their own threading context, but any system that is actually trying to optimally manage its threading environment needs an ability to inject/control how threading is done (sharing thread pools, limiting parallelism, etc). Not making the ThreadPool/global executor tied to process-level statics seems like a good step in that direction (I haven't looked at River's patch in detail to determine how much it boxes us out of more configurability down the road). I would assume that follow-on APIs would allow the creation of MLIRContext instances with more control over their threading resources.

In D104207#2831076, @lattner wrote:

The problem with pulling this into an MLIRContext is that parallelism isn't specific to MLIR. It is specific to the machine that is being run on. It's not like MLIR gets some cpus and (LLVM or higher-level SW) gets others.

-Chris

We already have the situation where MLIR doesn't share thread pools with higher level SW, because that higher level SW doesn't necessarily try to use the same threading APIs (well technically right now even if it did they wouldn't get multi-threaded execution, but that is more of a current implementation problem). I know several downstream users where this is the case. If we want to share thread pools with higher level SW, we likely should have a virtual thread pool interface that gets used so that users can hook in their own implementations. If we go that route, I don't see much difference between static or not, given that MLIR usages of threading already requires access to the context (to see if threading is disabled, handling diagnostics, etc.) The major benefits of non-static, are that it is much much easier to understand/debug issues, and provides much greater control over the thread pool used.

Outside of the current verifier incident, I've also run into this same issue when multi-threading the inliner. It was equally painful to debug, extremely hard to reliably reproduce, and I was only barely able to massage things enough so that it disappeared.

rriddle mentioned this in rG6569cf2a44bf: [mlir] Add a ThreadPool to MLIRContext and refactor MLIR threading usage.Jun 22 2021, 6:33 PM

I think D61115 does not completely solve the issue. The issue you're seeing in https://reviews.llvm.org/D104207#2826290 is essentially PR41508.
The sequence that leads to the bug seems to be:

main() calls parallelForEachN with, say, only one element to be processed by the TaskGroup.
Since it's the first active TaskGroup so TaskGroupInstances is incremented to 1, and the TaskGroup::Parallel flag is set.
TaskGroup::spawn() is called and since we're parallel we construct the ThreadPoolExecutor and push the work on its thread stack.
parallelForEachN goes of scope and calls the destructor for TaskGroup.
TaskGroupInstances is decremented to 0.
The destructor of Latch is called and calls sync() which suspends the main thread.
The work that was scheduled in 3. is finally executed on a ThreadPoolExecutor thread, so another parallelForEachN is called.
A new TaskGroup is constructed on the stack but since TaskGroupInstances is 0, we start again from 2. (incrementing it to 1, then we set Parallel to true and we schedule tasks on the ThreadPoolExecutor)
In your case, 3.-8. happens several times.

At this point we can end up in a situation where all ThreadPoolExecutor's threads are waiting in Latch.sync() and any potential task that could have unblocked them cannot be run.
This creates the situation in PR41508 or what you're seeing above.

Probably changing to something like:
TaskGroup::~TaskGroup() { L.sync(); --TaskGroupInstances; }
would fix it, but this has to be confirmed by someone who could repro the bug in the first place.
Also calling Cond.notify_all(); inside the lock isn't optimal, this could be done outside of lock.

What do you think?

Herald added subscribers: wenzhicui, wrengr, Chia-hungDuan. · View Herald TranscriptSep 16 2021, 12:55 PM

Please see D109914.

Nice finding!

aganea mentioned this in rG7b25fa8c7a15: [Support] Attempt to fix deadlock in ThreadGroup.Sep 18 2021, 10:49 AM

Revision Contents

Path

Size

mlir/

lib/

IR/

Verifier.cpp

142 lines

Diff 351768

mlir/lib/IR/Verifier.cpp

Show All 26 Lines
#include "mlir/IR/Verifier.h"		#include "mlir/IR/Verifier.h"
#include "mlir/IR/Attributes.h"		#include "mlir/IR/Attributes.h"
#include "mlir/IR/Dialect.h"		#include "mlir/IR/Dialect.h"
#include "mlir/IR/Dominance.h"		#include "mlir/IR/Dominance.h"
#include "mlir/IR/Operation.h"		#include "mlir/IR/Operation.h"
#include "mlir/IR/RegionKindInterface.h"		#include "mlir/IR/RegionKindInterface.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"
		#include "llvm/Support/Parallel.h"
#include "llvm/Support/PrettyStackTrace.h"		#include "llvm/Support/PrettyStackTrace.h"
#include "llvm/Support/Regex.h"		#include "llvm/Support/Regex.h"

#include <atomic>		#include <atomic>

using namespace mlir;		using namespace mlir;

namespace {		namespace {
/// This class encapsulates all the state used to verify an operation region.		/// This class encapsulates all the state used to verify an operation region.
class OperationVerifier {		class OperationVerifier {
public:		public:
explicit OperationVerifier() {}		explicit OperationVerifier(MLIRContext *context)
		: parallelismEnabled(context->isMultithreadingEnabled()) {}

/// Verify the given operation.		/// Verify the given operation.
LogicalResult verifyOpAndDominance(Operation &op);		LogicalResult verifyOpAndDominance(Operation &op);

private:		private:
/// Verify the given potentially nested region or block.		LogicalResult
LogicalResult verifyRegion(Region &region);		verifyBlock(Block &block,
LogicalResult verifyBlock(Block &block);		SmallVectorImpl<Operation *> &opsWithIsolatedRegions);
LogicalResult verifyOperation(Operation &op);		LogicalResult
		verifyOperation(Operation &op,
		SmallVectorImpl<Operation *> &opsWithIsolatedRegions);

/// Verify the dominance property of regions contained within the given		/// Verify the dominance property of regions contained within the given
/// Operation.		/// Operation.
LogicalResult verifyDominanceOfContainedRegions(Operation &op);		LogicalResult verifyDominanceOfContainedRegions(Operation &op,
		DominanceInfo &domInfo);

/// Emit an error for the given block.		/// Emit an error for the given block.
InFlightDiagnostic emitError(Block &bb, const Twine &message) {		InFlightDiagnostic emitError(Block &bb, const Twine &message) {
// Take the location information for the first operation in the block.		// Take the location information for the first operation in the block.
if (!bb.empty())		if (!bb.empty())
return bb.front().emitError(message);		return bb.front().emitError(message);

// Worst case, fall back to using the parent's location.		// Worst case, fall back to using the parent's location.
return mlir::emitError(bb.getParent()->getLoc(), message);		return mlir::emitError(bb.getParent()->getLoc(), message);
}		}

/// Dominance information for this operation, when checking dominance.		/// This is true if parallelism is enabled on the MLIRContext.
DominanceInfo *domInfo = nullptr;		const bool parallelismEnabled;
};		};
} // end anonymous namespace		} // end anonymous namespace

/// Verify the given operation.
LogicalResult OperationVerifier::verifyOpAndDominance(Operation &op) {		LogicalResult OperationVerifier::verifyOpAndDominance(Operation &op) {
// Verify the operation first.		SmallVector<Operation *> opsWithIsolatedRegions;
if (failed(verifyOperation(op)))
		// Verify the operation first, collecting any IsolatedFromAbove operations.
		if (failed(verifyOperation(op, opsWithIsolatedRegions)))
return failure();		return failure();

// Since everything looks structurally ok to this point, we do a dominance		// Since everything looks structurally ok to this point, we do a dominance
// check for any nested regions. We do this as a second pass since malformed		// check for any nested regions. We do this as a second pass since malformed
// CFG's can cause dominator analysis constructure to crash and we want the		// CFG's can cause dominator analysis construction to crash and we want the
// verifier to be resilient to malformed code.		// verifier to be resilient to malformed code.
DominanceInfo theDomInfo;		if (op.getNumRegions() != 0) {
domInfo = &theDomInfo;		DominanceInfo domInfo;
if (failed(verifyDominanceOfContainedRegions(op)))		if (failed(verifyDominanceOfContainedRegions(op, domInfo)))
return failure();		return failure();

domInfo = nullptr;
return success();
}		}

LogicalResult OperationVerifier::verifyRegion(Region &region) {		// Check the dominance properties and invariants of any operations in the
if (region.empty())		// regions contained by the 'opsWithIsolatedRegions' operations.
return success();		if (!parallelismEnabled \|\| opsWithIsolatedRegions.size() <= 1) {
		// If parallelism is disabled or if there is only 0/1 operation to do, use
// Verify the first block has no predecessors.		// a simple non-parallel loop.
auto *firstBB = &region.front();		for (Operation *op : opsWithIsolatedRegions) {
if (!firstBB->hasNoPredecessors())		if (failed(verifyOpAndDominance(*op)))
return mlir::emitError(region.getLoc(),		return failure();
"entry block of region may not have predecessors");		}
		} else {
// Verify each of the blocks within the region.		// Otherwise, verify the operations and their bodies in parallel.
for (Block &block : region)		ParallelDiagnosticHandler handler(op.getContext());
if (failed(verifyBlock(block)))		std::atomic<bool> passFailed(false);
		llvm::parallelForEachN(0, opsWithIsolatedRegions.size(), [&](size_t opIdx) {
		handler.setOrderIDForThread(opIdx);
		if (failed(verifyOpAndDominance(*opsWithIsolatedRegions[opIdx])))
		passFailed = true;
		handler.eraseOrderIDForThread();
		});
		if (passFailed)
return failure();		return failure();
		}

return success();		return success();
}		}

/// Returns true if this block may be valid without terminator. That is if:		/// Returns true if this block may be valid without terminator. That is if:
/// - it does not have a parent region.		/// - it does not have a parent region.
/// - Or the parent region have a single block and:		/// - Or the parent region have a single block and:
/// - This region does not have a parent op.		/// - This region does not have a parent op.
/// - Or the parent op is unregistered.		/// - Or the parent op is unregistered.
/// - Or the parent op has the NoTerminator trait.		/// - Or the parent op has the NoTerminator trait.
static bool mayBeValidWithoutTerminator(Block *block) {		static bool mayBeValidWithoutTerminator(Block *block) {
if (!block->getParent())		if (!block->getParent())
return true;		return true;
if (!llvm::hasSingleElement(*block->getParent()))		if (!llvm::hasSingleElement(*block->getParent()))
return false;		return false;
Operation *op = block->getParentOp();		Operation *op = block->getParentOp();
return !op \|\| op->mightHaveTrait<OpTrait::NoTerminator>();		return !op \|\| op->mightHaveTrait<OpTrait::NoTerminator>();
}		}

LogicalResult OperationVerifier::verifyBlock(Block &block) {		LogicalResult OperationVerifier::verifyBlock(
		Block &block, SmallVectorImpl<Operation *> &opsWithIsolatedRegions) {
for (auto arg : block.getArguments())		for (auto arg : block.getArguments())
if (arg.getOwner() != &block)		if (arg.getOwner() != &block)
return emitError(block, "block argument not owned by block");		return emitError(block, "block argument not owned by block");

// Verify that this block has a terminator.		// Verify that this block has a terminator.
if (block.empty()) {		if (block.empty()) {
if (mayBeValidWithoutTerminator(&block))		if (mayBeValidWithoutTerminator(&block))
return success();		return success();
return emitError(block, "empty block: expect at least a terminator");		return emitError(block, "empty block: expect at least a terminator");
}		}

// Check each operation, and make sure there are no branches out of the		// Check each operation, and make sure there are no branches out of the
// middle of this block.		// middle of this block.
for (auto &op : llvm::make_range(block.begin(), block.end())) {		for (auto &op : block.getOperations()) {
		rriddleUnsubmitted Done Reply Inline Actions I don't think the getOperations here is necessary. rriddle: I don't think the getOperations here is necessary.
// Only the last instructions is allowed to have successors.		// Only the last instructions is allowed to have successors.
if (op.getNumSuccessors() != 0 && &op != &block.back())		if (op.getNumSuccessors() != 0 && &op != &block.back())
return op.emitError(		return op.emitError(
"operation with block successors must terminate its parent block");		"operation with block successors must terminate its parent block");

if (failed(verifyOperation(op)))		// If this operation has regions and is IsolatedFromAbove, we defer
		// checking. This allows us to parallelize verification better.
		if (op.getNumRegions() != 0 &&
		op.hasTrait<OpTrait::IsIsolatedFromAbove>()) {
		opsWithIsolatedRegions.push_back(&op);
		} else {
		// Otherwise, check the operation inline.
		if (failed(verifyOperation(op, opsWithIsolatedRegions)))
return failure();		return failure();
}		}
		}

// Verify that this block is not branching to a block of a different		// Verify that this block is not branching to a block of a different
// region.		// region.
for (Block *successor : block.getSuccessors())		for (Block *successor : block.getSuccessors())
if (successor->getParent() != block.getParent())		if (successor->getParent() != block.getParent())
return block.back().emitOpError(		return block.back().emitOpError(
"branching to block of a different region");		"branching to block of a different region");

// If this block doesn't have to have a terminator, don't require it.		// If this block doesn't have to have a terminator, don't require it.
if (mayBeValidWithoutTerminator(&block))		if (mayBeValidWithoutTerminator(&block))
return success();		return success();

Operation &terminator = block.back();		Operation &terminator = block.back();
if (!terminator.mightHaveTrait<OpTrait::IsTerminator>())		if (!terminator.mightHaveTrait<OpTrait::IsTerminator>())
return block.back().emitError("block with no terminator, has ")		return block.back().emitError("block with no terminator, has ")
<< terminator;		<< terminator;

return success();		return success();
}		}

LogicalResult OperationVerifier::verifyOperation(Operation &op) {		/// Verify the properties and dominance relationships of this operation,
		rriddleUnsubmitted Done Reply Inline Actions nit: Can you move this doc to the declaration instead? rriddle: nit: Can you move this doc to the declaration instead?
		lattnerAuthorUnsubmitted Done Reply Inline Actions Sure. lattner: Sure.
		/// stopping region recursion at any "isolated from above operations". Any such
		/// ops are returned in the opsWithIsolatedRegions vector.
		LogicalResult OperationVerifier::verifyOperation(
		Operation &op, SmallVectorImpl<Operation *> &opsWithIsolatedRegions) {
// Check that operands are non-nil and structurally ok.		// Check that operands are non-nil and structurally ok.
for (auto operand : op.getOperands())		for (auto operand : op.getOperands())
if (!operand)		if (!operand)
return op.emitError("null operand found");		return op.emitError("null operand found");

/// Verify that all of the attributes are okay.		/// Verify that all of the attributes are okay.
for (auto attr : op.getAttrs()) {		for (auto attr : op.getAttrs()) {
// Check for any optional dialect specific attributes.		// Check for any optional dialect specific attributes.
Show All 17 Lines	for (unsigned i = 0; i < numRegions; ++i) {
RegionKind kind =		RegionKind kind =
kindInterface ? kindInterface.getRegionKind(i) : RegionKind::SSACFG;		kindInterface ? kindInterface.getRegionKind(i) : RegionKind::SSACFG;
// Check that Graph Regions only have a single basic block. This is		// Check that Graph Regions only have a single basic block. This is
// similar to the code in SingleBlockImplicitTerminator, but doesn't		// similar to the code in SingleBlockImplicitTerminator, but doesn't
// require the trait to be specified. This arbitrary limitation is		// require the trait to be specified. This arbitrary limitation is
// designed to limit the number of cases that have to be handled by		// designed to limit the number of cases that have to be handled by
// transforms and conversions.		// transforms and conversions.
if (op.isRegistered() && kind == RegionKind::Graph) {		if (op.isRegistered() && kind == RegionKind::Graph) {
// Empty regions are fine.
if (region.empty())
continue;

// Non-empty regions must contain a single basic block.		// Non-empty regions must contain a single basic block.
if (std::next(region.begin()) != region.end())		if (!region.empty() && !region.hasOneBlock())
return op.emitOpError("expects graph region #")		return op.emitOpError("expects graph region #")
<< i << " to have 0 or 1 blocks";		<< i << " to have 0 or 1 blocks";
}		}
if (failed(verifyRegion(region)))
		if (region.empty())
		continue;

		// Verify the first block has no predecessors.
		auto *firstBB = &region.front();
		rriddleUnsubmitted Done Reply Inline Actions nit: Spell out auto here. rriddle: nit: Spell out auto here.
		if (!firstBB->hasNoPredecessors())
		return mlir::emitError(
		rriddleUnsubmitted Done Reply Inline Actions Is the mlir:: here necessary? rriddle: Is the mlir:: here necessary?
		lattnerAuthorUnsubmitted Done Reply Inline Actions Yes, but not for a good reason. Verifier defined its own emitError for a weird reason. fixed. lattner: Yes, but not for a good reason. Verifier defined its own emitError for a weird reason. fixed.
		region.getLoc(), "entry block of region may not have predecessors");
		rriddleUnsubmitted Done Reply Inline Actions op.getLoc()? I'm assuming this is what region.getLoc does internally. rriddle: op.getLoc()? I'm assuming this is what region.getLoc does internally.

		// Verify each of the blocks within the region.
		for (Block &block : region)
		if (failed(verifyBlock(block, opsWithIsolatedRegions)))
return failure();		return failure();
}		}
}		}

// If this is a registered operation, there is nothing left to do.		// If this is a registered operation, there is nothing left to do.
if (opInfo)		if (opInfo)
return success();		return success();

// Otherwise, verify that the parent dialect allows un-registered operations.		// Otherwise, verify that the parent dialect allows un-registered operations.
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	static void diagnoseInvalidOperandDominance(Operation &op, unsigned operandNo) {
else if (region1->isProperAncestor(region2))		else if (region1->isProperAncestor(region2))
note << " in a child region)";		note << " in a child region)";
else		else
note << " neither in a parent nor in a child region)";		note << " neither in a parent nor in a child region)";
}		}

/// Verify the dominance of each of the nested blocks within the given operation		/// Verify the dominance of each of the nested blocks within the given operation
LogicalResult		LogicalResult
OperationVerifier::verifyDominanceOfContainedRegions(Operation &op) {		OperationVerifier::verifyDominanceOfContainedRegions(Operation &op,
		DominanceInfo &domInfo) {
for (Region &region : op.getRegions()) {		for (Region &region : op.getRegions()) {
// Verify the dominance of each of the held operations.		// Verify the dominance of each of the held operations.
for (Block &block : region) {		for (Block &block : region) {
// Dominance is only meaningful inside reachable blocks.		// Dominance is only meaningful inside reachable blocks.
bool isReachable = domInfo->isReachableFromEntry(&block);		bool isReachable = domInfo.isReachableFromEntry(&block);

for (Operation &op : block) {		for (Operation &op : block) {
if (isReachable) {		if (isReachable) {
// Check that operands properly dominate this use.		// Check that operands properly dominate this use.
for (unsigned operandNo = 0, e = op.getNumOperands(); operandNo != e;		for (unsigned operandNo = 0, e = op.getNumOperands(); operandNo != e;
++operandNo) {		++operandNo) {
if (domInfo->properlyDominates(op.getOperand(operandNo), &op))		if (domInfo.properlyDominates(op.getOperand(operandNo), &op))
		rriddleUnsubmitted Done Reply Inline Actions Can we change this loop to use `llvm::enumerate(op.getOperands())` ? rriddle: Can we change this loop to use `llvm::enumerate(op.getOperands())` ?
		lattnerAuthorUnsubmitted Done Reply Inline Actions yes, nicer! lattner: yes, nicer!
continue;		continue;

diagnoseInvalidOperandDominance(op, operandNo);		diagnoseInvalidOperandDominance(op, operandNo);
return failure();		return failure();
}		}
}		}

// Recursively verify dominance within each operation in the		// Recursively verify dominance within each operation in the
// block, even if the block itself is not reachable, or we are in		// block, even if the block itself is not reachable, or we are in
// a region which doesn't respect dominance.		// a region which doesn't respect dominance.
if (op.getNumRegions() != 0)		if (op.getNumRegions() != 0) {
if (failed(verifyDominanceOfContainedRegions(op)))		// If this operation is IsolatedFromAbove, then we'll handle it in the
		// outer verification loop.
		if (op.hasTrait<OpTrait::IsIsolatedFromAbove>())
		continue;

		if (failed(verifyDominanceOfContainedRegions(op, domInfo)))
return failure();		return failure();
}		}
}		}
}		}
		}
return success();		return success();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Entrypoint		// Entrypoint
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Perform (potentially expensive) checks of invariants, used to detect		/// Perform (potentially expensive) checks of invariants, used to detect
/// compiler bugs. On error, this reports the error through the MLIRContext and		/// compiler bugs. On error, this reports the error through the MLIRContext and
/// returns failure.		/// returns failure.
LogicalResult mlir::verify(Operation *op) {		LogicalResult mlir::verify(Operation *op) {
return OperationVerifier().verifyOpAndDominance(*op);		return OperationVerifier(op->getContext()).verifyOpAndDominance(*op);
}		}