[Dominators] Rewrite the dominator implementation for efficiency. NFC.
The previous impl densely scanned the entire region starting with an op
when dominators were created, creating a DominatorTree for every region.
This is extremely expensive up front -- particularly for clients like
Linalg/Transforms/Fusion.cpp that construct DominanceInfo for a single
query. It is also extremely memory wasteful for IRs that use single
block regions commonly (e.g. affine.for) because it's making a
dominator tree for a region that has trivial dominance. The
implementation also had numerous unnecessary minor efficiencies, e.g.
doing multiple walks of the region tree or tryGetBlocksInSameRegion
building a DenseMap that it didn't need.
This patch switches to an approach where [Post]DominanceInfo is free
to construct, and which lazily constructs DominatorTree's for any
multiblock regions that it needs. This avoids the up-front cost
entirely, making its runtime proportional to the complexity of the
region tree instead of # ops in a region. This also avoids the memory
and time cost of creating DominatorTree's for single block regions.
Finally this rewrites the implementation for simplicity and to avoids
the constant factor problems the old implementation had.
Differential Revision: https://reviews.llvm.org/D103384