This is an archive of the discontinued LLVM Phabricator instance.

[Verifier] Speed up and parallelize dominance checking. NFC
AbandonedPublic

Authored by lattner on May 29 2021, 5:08 PM.

Details

Summary

One of the key algorithms used in the "mlir::verify(op)" method is the
dominance checker, which ensure that operand values properly dominate
the operations that use them. This is currently slower than it needs
to be, so let's parallelize it:

  1. It splits dominance checking into "IsolatedFromAbove" units. Instead of building a monolithic DominanceInfo for everything in a module, for example, it checks dominance for the module to all the functions within it (noop, since there are no operands at this level) then each function gets their own DominanceInfo for each of their scope.
  2. It parallelizes analysis of collections IsolatedFromAbove operations, e.g. each of the functions within a Module.

All together this is more than a 10% speedup on firtool in circt on a
large design when run in -verify-each mode (our default) since the verifier
is invoked after each pass.

Still todo is to parallelize the main verifier pass. I decided to split
this out to its own thing since this patch is already large-ish.

Diff Detail

Event Timeline

lattner created this revision.May 29 2021, 5:08 PM
lattner requested review of this revision.May 29 2021, 5:08 PM
lattner updated this revision to Diff 348667.May 29 2021, 5:13 PM

Improve comments.

rriddle added inline comments.May 29 2021, 5:35 PM
mlir/lib/IR/Verifier.cpp
343–345

Can you just use Operation::isBeforeInBlock? I don't think there needs to be a look up table here.

lattner added inline comments.May 30 2021, 10:22 AM
mlir/lib/IR/Verifier.cpp
343–345

yes, I should have remembered that. That provides an additional speedup, and allows generalizing this a bit more. I'll incorporate that in my next revision, thanks!

lattner updated this revision to Diff 348705.May 30 2021, 10:39 AM

Use isBeforeInBlock instead of maintaining a SmallPtrSet.

Actually, instead of making this more complicated, I'd like to keep this simple in this patch as it is.

In a different work thread, I feel compelled to completely replace the Dominator implementation: there is no good reason for it to be as slow as it is, and it is being horribly misused by some clients (e.g. Linalg/.../Fusion.cpp). Making it lazy and not making an llvm::domtree for single block regions will solve this problem and define away problems for other clients.

That said, I'd like to take this step and do that in parallel, then re-simplify the verifier if it works out.

The dominators rewrite is here: https://reviews.llvm.org/D103384. Once that goes into tree I'll merge it into this patch to simplify it - we can drop the "don't recurse into IsolatedFromAbove regions" change clearly and the manual laziness isn't needed, but the parallel constructs are.

rriddle accepted this revision.May 31 2021, 10:29 PM

LGTM, especially if the DominanceInfo stuff is going to be cleaned up in a followup. Thanks for cleaning this up!

mlir/lib/IR/Verifier.cpp
374–381
433

The verifier isn't guaranteed to be called in a situation that can handle diagnostics being emitted in parallel. I think we need to use ParallelDiagnosticHandler here.

(Honestly we should just add some MLIR specific threading utilities that wrap all of this automatically)

This revision is now accepted and ready to land.May 31 2021, 10:29 PM

SGTM, I'll revise this after the other patch lands, thank you for the review!

lattner updated this revision to Diff 349011.Jun 1 2021, 10:22 AM

Updates from River's review

lattner marked an inline comment as done and an inline comment as not done.Jun 1 2021, 10:24 AM
lattner added inline comments.
mlir/lib/IR/Verifier.cpp
433

Great catch. I'm not super familiar with ParallelDiagnosticHandler. Does this look like the right thing?

rriddle added inline comments.Jun 1 2021, 12:46 PM
mlir/lib/IR/Verifier.cpp
439

When using ParallelDiagnosticHandler you need to set an order for each of the child threads that get created (this is what the handler uses to make sure diagnostics get ordered properly):

https://mlir.llvm.org/docs/Diagnostics/#parallel-diagnostic-handler
https://github.com/llvm/llvm-project/blob/e9f82e9e1b6851ff66010a0e500346379b1ef572/mlir/include/mlir/IR/Diagnostics.h#L628

lattner updated this revision to Diff 349103.Jun 1 2021, 2:47 PM

Use setOrderIDForThread correctly.

lattner marked an inline comment as done.Jun 1 2021, 2:48 PM
lattner added inline comments.
mlir/lib/IR/Verifier.cpp
439

Got it, thanks. This is actually a funny situation where we don't care what order the diagnostics come out in. We only care about "no errors or some errors". That said, I incorporated this since I expect there is little cost to doing so.

I'm going to land the other patch and then merge it into this one, I'll ping this when I have done so. Thanks!

lattner updated this revision to Diff 349114.Jun 1 2021, 3:20 PM
lattner marked an inline comment as done.

Dramatically simplify this patch by merging in the dominator improvements.

lattner updated this revision to Diff 349115.Jun 1 2021, 3:22 PM

Final grooming, remove commented out #include, use range base for loop again.

Ok, I think this patch is ready for a re-check. Thanks!

lattner updated this revision to Diff 349117.Jun 1 2021, 3:29 PM

Update commit message

lattner updated this revision to Diff 349118.Jun 1 2021, 3:32 PM
lattner edited the summary of this revision. (Show Details)

update commit message for real this time.

I'm abandoning this patch. With the other general improvements to dominators and with the inefficiency in the threading stuff, this is no longer a win.

lattner abandoned this revision.Jun 1 2021, 4:41 PM
This revision was landed with ongoing or failed builds.Jun 8 2021, 9:47 AM

I thought you abandoned this change? Looks like you landed it.

I'm not sure how this landed, it looks like it got pushed as a consequence of the twine patch. Maybe I put them both into the same directory accidentally (my git fu is not great).

I'll clean this up when I have a chance soon, I do have a plan to parallelize the verifier, I will merge the dom checking into that flow and make the whole thing fast. I'll take care of it then. Until then I don't think this patch causes any significant harm.

I'm sorry for flubbing this though!!! If you'd like to revert this, please feel free.

Run ln -sf ../../llvm/utils/git/pre-push.py .git/hooks/pre-push in your local repo: it'll protect you against accidentally pushing two commits by asking confirmation when this happens

We are seeing a mysterious hang in IREE's python tests after this change that happens about 90% of the time (making the test flakey). Entirely possible that it's something broken in our python bindings, especially given that I haven't yet seen it in anything that doesn't go through the bindings. But given that it was apparently landed accidentally, perhaps you'd consider a revert :-)

mlir/lib/IR/Verifier.cpp
414

Changing this to else if(false) gets rid of the hanging

(notably the python tests are just shelling out to IREE binaries, so "bindings" is a strong word here 😁)

Oh and I also get a build failure because of a missing include of atomic:

/usr/local/google/home/gcmn/src/iree/third_party/llvm-project/mlir/lib/IR/Verifier.cpp:365:27: error: implicit instantiation of undefined template 'std::atomic<bool>'
        std::atomic<bool> passFailed(false);
                          ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/atomic_base.h:152:12: note: template is declared here
    struct atomic;

which is also failing on the premerge checks.

Update: we suspect this is something in our python logic for piping subprocesses together, so likely this change is just a bystander :-)

I don't know how to revert this, but feel free if you see it as a smoking gun. I only abandoned this because "it didn't show a big speedup and I have a hopefully better idea of how to improve things", not because I think this is harmful

The new patch is available here: https://reviews.llvm.org/D103373/new/

The difference is that it parallelizes operation verification *and* dominance checks, pulling them into a common framework. This helps the granularity issues that made this patch be not a significant win.

Yes, thank you! Too many similar things.