This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] Make sparse compiler more admissible.
ClosedPublic

Authored by Peiming on Sep 12 2022, 5:00 PM.

Details

Summary

Previously, the iteration graph is computed without priority. This patch add a heuristic when computing the iteration graph by starting with Reduction iterator when doing topo sort, which makes Reduction iterators (likely) appear as late in the sorted array as possible.

The current sparse compiler also failed to compile the newly added case.

Diff Detail

Event Timeline

Peiming created this revision.Sep 12 2022, 5:00 PM
Herald added a project: Restricted Project. · View Herald Transcript
Peiming requested review of this revision.Sep 12 2022, 5:00 PM
Peiming edited the summary of this revision. (Show Details)
Peiming edited the summary of this revision. (Show Details)Sep 12 2022, 5:07 PM
Peiming updated this revision to Diff 459597.Sep 12 2022, 5:14 PM

fix comments

Peiming added inline comments.Sep 12 2022, 5:26 PM
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
350

An optimal solution could be computing all possible topo result and pick the best one, but I am afraid it would be very inefficient. Is there any other known algorithm that I am not aware of can be applied here?

Peiming updated this revision to Diff 459604.Sep 12 2022, 5:42 PM

update comments

Peiming added inline comments.Sep 12 2022, 5:46 PM
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
350

An optimal solution could be computing all possible topo result and pick the best one, but I am afraid it would be very inefficient. Is there any other known algorithm that I am not aware of can be applied here?

Probably a priority queue can helps to get an O(NlogN) solution, is that more desirable?

aartbik added inline comments.Sep 12 2022, 6:19 PM
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
350

I actually have a allTopSorts() implementation in my local workspace that I used in the past to see what version to pick if there are several. I think the n is usually small enough not too worry too much, but also note that we actually are still in the phase where we separate *policy* from *mechanism*, i.e. we make sure we can generate code but assume something else will tell us what choices to make rather than introducing very advanced heuristics.

Perhaps we can do the same trick as we do for the L1822 with an extra case that takes the isAdmissble into account?

Peiming updated this revision to Diff 459622.Sep 12 2022, 7:03 PM

Replace heuristics with the optimal solution.

Peiming added inline comments.Sep 12 2022, 7:05 PM
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
235

We do not really need a priority queue in our case, as we have limited value to sort, this bucket sort-like approach is faster and more straightforward.

Peiming updated this revision to Diff 459802.Sep 13 2022, 10:41 AM

+ try a less strict constraint for topo sort if the op is inadmissible

Peiming added inline comments.Sep 13 2022, 10:54 AM
mlir/test/Dialect/SparseTensor/sparse_sddmm.mlir
162

It should have a better spatial locality here compared to the previously generated code?

Peiming added inline comments.Sep 13 2022, 10:59 AM
mlir/test/Dialect/SparseTensor/sparse_sddmm.mlir
161–162

These two accesses here have worse spatial locality, one is access by row and one is access by column.

aartbik added inline comments.Sep 13 2022, 2:07 PM
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
228

interator -> iterator

230

A nice side-effect of using the "worklist" is that you also removed the recursion, which will make a lot of people happy ;-)

251

in-degree

276

make this

if (b) {

} else if (!adjM...) {

}

304–305

in-degree

339

Could we have achieved a similar heuristic by adding one extra "mask" layer in which we add parallel -> reduce edges?
It is not completely the same, but it feels like we are using heurstics at various levels now.

Conversely, with the new heuristic could we get rid of this "unrelated" step?

1821

Keep the original comment for this block

// Builds the tensor expression for the Linalg operation in SSA form.

1842

Make the nested if a && on conditions for shorter code

mlir/test/Dialect/SparseTensor/sparse_sddmm.mlir
162

We actually end up with better code here, since the expand/compress is better for keeping the resulting matrix sparse..
However, ironically, we end up with a par-red-par loop, whereas the original was par-par-red

Peiming updated this revision to Diff 459872.Sep 13 2022, 2:25 PM
Peiming marked 8 inline comments as done.

Address comments

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
339

I think it might be hard, because what we really care is only the *first* reduction iterator.

Also, I am not sure when and how Undef level is used here, maybe yourself are the better person to answer it ;-)

mlir/test/Dialect/SparseTensor/sparse_sddmm.mlir
162

Lol, yeah. But we picked a set of more strict (better) order constraints.

Peiming marked an inline comment as done.Sep 13 2022, 3:01 PM
aartbik accepted this revision.Sep 13 2022, 3:25 PM
aartbik added inline comments.
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
339

Yeah, I think we should try removing this and keep the heuristics simple (dense -> sparse with mask) and then your much better (parallel < reduction) for each invocation. But we can try removing that after this revision goes in....

This revision is now accepted and ready to land.Sep 13 2022, 3:25 PM
This revision was landed with ongoing or failed builds.Sep 14 2022, 8:59 AM
This revision was automatically updated to reflect the committed changes.