Page MenuHomePhabricator

Implementation of the root ordering algorithm
ClosedPublic

Authored by sfuniak on Aug 23 2021, 5:31 AM.

Details

Summary

This is commit 3 of 4 for the multi-root matching in PDL, discussed in https://llvm.discourse.group/t/rfc-multi-root-pdl-patterns-for-kernel-matching/4148 (topic flagged for review).

We form a graph over the specified roots, provided in pdl.rewrite, where two roots are connected by a directed edge if the target root can be connected (via a chain of operations) in the underlying pattern to the source root. We place a restriction that the path connecting the two candidate roots must only contain the nodes in the subgraphs underneath these two roots. The cost of an edge is the smallest number of upward traversals (edges) required to go from the source to the target root, and the connector is a Value in the intersection of the two subtrees rooted at the source and target root that results in that smallest number of such upward traversals. Optimal root ordering is then formulated as the problem of finding a spanning arborescence (i.e., a directed spanning tree) of minimal weight.

In order to determine the spanning arborescence (directed spanning tree) of minimum weight, we use the Edmonds' algorithm. The worst-case computational complexity of this algorithm is O(_N_^3) for a single root, where _N_ is the number of specified roots. The pdl-to-pdl_interp lowering calls this algorithm as a subroutine _N_ times (once for each candidate root), so the overall complexity of root ordering is O(_N_^4). If needed, this complexity could be reduced to O(_N_^3) with a more efficient algorithm. However, note that the underlying implementation is very efficient, and _N_ in our instances tends to be very small (<10). Therefore, we believe that the proposed (asymptotically suboptimal) implementation will suffice for now.

Testing: a unit test of the algorithm

Diff Detail

Event Timeline

sfuniak created this revision.Aug 23 2021, 5:31 AM
sfuniak requested review of this revision.Aug 23 2021, 5:31 AM
Mogball added a subscriber: Mogball.Oct 7 2021, 8:23 PM
Mogball added inline comments.
mlir/lib/Conversion/PDLToPDLInterp/RootOrdering.cpp
57

nit: e = graph.end(); outer != e

214

Can you estimate a required size for reserve? Depending on the STL implementation, the initializer_list constructor will guarantee that the first insertion will need to re-allocate.

sfuniak updated this revision to Diff 378559.Oct 10 2021, 8:47 PM

Addressed review feedback.

rriddle added inline comments.Oct 12 2021, 12:01 AM
mlir/include/mlir/Conversion/PDLToPDLInterp/RootOrdering.h
1–3 ↗(On Diff #378559)

We generally don't expose implementation headers via the public include/ directory. Unit tests for these types of components generally just use relative include to the necessary header in the implementation folder, e.g. https://github.com/llvm/llvm-project/blob/5829ba7afc13eaa004f16906a5004a61648ac403/llvm/unittests/CodeGen/AllocationOrderTest.cpp#L9

32 ↗(On Diff #378559)

Is this still the case? Or are the roots selected from the match part of a pdl.pattern?

Also, do we have verification that all of the operations are connected in some form with a pdl.pattern? (i.e. when the root to the pdl.rewrite isn't provided)

62 ↗(On Diff #378559)
78 ↗(On Diff #378559)

Is there a benefit to using double map vs something like DenseMap<std::pair<Value, Value>, RootOrderingCost>?

96 ↗(On Diff #378559)

nit: Drop the explicit for multi-parameter constructors.

98–99 ↗(On Diff #378559)

Can you document the result value here?

mlir/lib/Conversion/PDLToPDLInterp/RootOrdering.cpp
34

nit: assert messages should start with lower case

71

Can you document the cost update here.

95

Won't this invalidate your current iterator? Could you switch this loop to use llvm::make_early_inc_range instead?

114–119

Can you move most of this documentation to the header instead?

141

Given that this is always true for the first iteration, switch to a do/while?

173

nit: Can you spell out auto here?

181

nit: Drop the std:: from size_t

189
197–199

Feel free to just inline this into the header.

219

Thanks, I will make the changes suggested.

mlir/include/mlir/Conversion/PDLToPDLInterp/RootOrdering.h
32 ↗(On Diff #378559)

Stale comment, will fix.

In general, the operations in the pdl.pattern must form a connected tree for this algorithm to succeed, whether we detect the root automatically or the user provides it explicitly. In the latter case, we still call this algorithm, on the provided root, in order to determine the best way to connect the roots. I believe the code asserts if the tree is not connected, but you are right in saying that we should verify this explicitly in the pdl.pattern. I will add the check to the last stack diff.

78 ↗(On Diff #378559)

Yes, it is quite important that these are stored as nested maps, so that you can iterate the in-neighbors of a single vertex efficiently.

mlir/lib/Conversion/PDLToPDLInterp/RootOrdering.cpp
95

You are right, thanks for pointing this out. I will make the change you suggested.

sfuniak marked 9 inline comments as done.Oct 17 2021, 5:24 PM
sfuniak added inline comments.
mlir/lib/Conversion/PDLToPDLInterp/RootOrdering.cpp
95

On the second thought, I find the llvm::make_early_inc_range too magical and (assuming we use range-based for loop) somewhat less performant, because we'd erase by the key rather than iterator. I will fix the bug differently.

173

parents is a DenseMap, and its value type is an implementation detail llvm::detail::DenseMapPair<KeyT, ValueT>. will add some more documentation instead.

sfuniak updated this revision to Diff 380913.Oct 20 2021, 5:41 AM

Addressed review feedback.

rriddle accepted this revision.Oct 28 2021, 11:27 AM

LG, thanks.

This revision is now accepted and ready to land.Oct 28 2021, 11:27 AM
sfuniak updated this revision to Diff 384963.Thu, Nov 4, 9:51 PM

Rebased.

This revision was automatically updated to reflect the committed changes.

The unit-tests seems leaky: https://lab.llvm.org/staging/#/builders/191/builds/2310

You should be able to reproduce locally with '-DLLVM_USE_SANITIZER=Address;Undefined' (and using clang as host compiler)

I am not able to reproduce the leak on Apple clang version 12.0.5 (running on M1). Will try on a different machine tomorrow.