This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] refine heuristic for iteration graph topsort
ClosedPublic

Authored by aartbik on Sep 1 2021, 3:31 PM.

Details

Summary

The sparse index order must always be satisfied, but this
may give a choice in topsorts for several cases. We broke
ties in favor of any dense index order, since this gives
good locality. However, breaking ties in favor of pushing
unrelated indices into sparse iteration spaces gives better
asymptotic complexity. This revision improves the heuristic.

Note that in the long run, we are really interested in using
ML for ML to find the best loop ordering as a replacement for
such heuristics.

Diff Detail

Event Timeline

aartbik created this revision.Sep 1 2021, 3:31 PM
aartbik requested review of this revision.Sep 1 2021, 3:31 PM

Not surprisingly, this shows a speedup of, for example, 5.9x on e.g. sampled dense dense for a sparse matrix like jpwh_991. But this is just because we manged to pick the wrong topsort before this fix ;-)

aartbik updated this revision to Diff 370371.Sep 2 2021, 12:49 PM

rebased with main

bixia accepted this revision.Sep 2 2021, 2:32 PM
bixia added inline comments.
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
157–158

This doesn't match the code exactly now. Maybe change to "the dense constraint bit is not set" or the like?

158–159

Shall we name the constants and use the constant names to replace the "magic constants"?

const int kDenseConstraintBit = 0x1;
const int kSparseOuterConstraintBit = 0x2;

0x3 => kDenseConstraintBit | kSparseOuterConstraintBit

etc

mlir/test/Dialect/SparseTensor/sparse_2d.mlir
1053

Shall we add a comment to this test and maybe also rename the test, to make it explicit that the test makes sure the dense dimension k is the inner loop in the generated code for performance?

This revision is now accepted and ready to land.Sep 2 2021, 2:32 PM
aartbik marked 3 inline comments as done.Sep 2 2021, 5:21 PM
aartbik added inline comments.
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
157–158

Ah, it is in a way the same, but yes, rephrased the comment.

mlir/test/Dialect/SparseTensor/sparse_2d.mlir
1053

well, it is called sampled_dense_dense, right? and these 1d/2d/3d files usually have few comments on what to expect (other than what is expressed with CHECKs)

aartbik updated this revision to Diff 370453.Sep 2 2021, 5:21 PM
aartbik marked 2 inline comments as done.

used named bits in masked

bixia added inline comments.Sep 2 2021, 8:52 PM
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
1150–1156

Would it be better to write this as a loop, from SortMask::kIncludeUndef | SortMask::kIncludeDense down to 0 with step -1?

aartbik marked an inline comment as done.Sep 3 2021, 8:34 AM
aartbik added inline comments.
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
1150–1156

I agree that in the original form, it was just screaming "for-loop" here! But now with the symbolic bit masks, I think the spelled out version makes more sense.

This revision was automatically updated to reflect the committed changes.
aartbik marked an inline comment as done.
wrengr added inline comments.Sep 3 2021, 2:01 PM
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
1150–1156

I still kinda feel that using a for-loop would be clearer, since the logic is "if all the options fail, then return failure()". The only concern with a for-loop is if we're relying on a specific ordering to improve the effectiveness of short-circuiting; but I think that can be resolved with a comment (or if things get really hairy in the future then defining a custom iterator to make sure to enumerate things in the right order).