This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] fix performance bug in matmul with a sparse rhs due to suboptimal iteration graphs.
ClosedPublic

Authored by Peiming on Feb 27 2023, 5:18 PM.

Details

Summary

While dense tensors support random accesses, it is critical to visit them in a row-major order for better cache locality. However, we previously consider dense inputs and outputs together when computing constraints for building iteration graph, it could lead us to less efficient iteration graphs.

This patch adds a new SortMask::kIncludeDenseInput to treat dense inputs/outputs separately when building iteration graph, thus increasing the chance for use to construct a better iteration graph.

A more fine-grained approach is to treat each input separately.

Note, related to:
https://github.com/llvm/llvm-project/issues/51651

Diff Detail

Event Timeline

Peiming created this revision.Feb 27 2023, 5:18 PM
Herald added a project: Restricted Project. · View Herald Transcript
Peiming requested review of this revision.Feb 27 2023, 5:18 PM
Peiming updated this revision to Diff 500983.Feb 27 2023, 5:28 PM

update comments.

Peiming added inline comments.Feb 27 2023, 5:30 PM
mlir/test/Dialect/SparseTensor/sparse_2d.mlir
1034

Inefficient accessing pattern

mlir/test/Dialect/SparseTensor/sparse_kernels.mlir
177

Inefficient accessing pattern

226–274

Inefficient access pattern

mlir/test/Dialect/SparseTensor/sparse_sddmm.mlir
90

Inefficient access pattern

wrengr added inline comments.Feb 27 2023, 6:03 PM
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
458–472

Since the bitmasks overlap, it's a bit hard to reason about these three conditions, and it'd be better to rearrange them to make things easier to follow. If I'm reading things correctly then the following should be logically equivalent to the current code:

if (!enc) {
  const bool lacksDenseInput = !(mask & kIncludeDenseInput);
  const bool lacksDenseOutput = !(mask & kIncludeDenseOutput);
  const bool hasDpsInput = env.op().isDpsInput(&t);
  if ( (lacksDenseInput && lacksDenseOutput)
    || (lacksDenseInput && hasDpsInput)
    || (lacksDenseOutput && !hasDpsInput)
  ) continue;
}

That's still not particularly easy to follow, but it at least helps clarify things, and it gives a good place to start for rearranging the condition to be easier to reason about.

please add some rationale to the description on what has changed
(we have gone back and forth on loop ordering quite a bit already ;-)

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
50

I would organize these now as follows

  • individual tests b001 b010 b100
  • sets passed for progressive lowering b111 (all) bxxx (some other sets) b000 (none, only sparse)

or, if you don't like having them in the same place,
define the individual bits in the SortMask and define some consts for the "sets"

459

note that we also have mask testing at L515 and such, making this all a bit hard to follow
(since some masks are mutually exclusive dense/sparse but some apply to both)

Peiming edited the summary of this revision. (Show Details)Feb 27 2023, 8:25 PM
Peiming updated this revision to Diff 501187.Feb 28 2023, 9:36 AM
Peiming marked an inline comment as done.

address comments.

Peiming marked 2 inline comments as done.Feb 28 2023, 9:36 AM
aartbik edited the summary of this revision. (Show Details)Feb 28 2023, 11:43 AM
aartbik added inline comments.Feb 28 2023, 11:47 AM
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
52

maybe add a comment line seperator, as in

The individual mask bits.
1, 2, 4
The subsets of mask bits
7, 3, 0

465

I also feel like we could do without this at some point ;-)

1550–1555

The earlier a constraint mask ....

But... I would say this is still very heuristic driven, no guarantees ;-)

mlir/test/Dialect/SparseTensor/sparse_2d.mlir
1012

why did you remove the sparsity CHECK

That seems very unrelated to this revision

mlir/test/Dialect/SparseTensor/sparse_kernels.mlir
206–210

same question

257

and here

mlir/test/Dialect/SparseTensor/sparse_sddmm.mlir
72

and here

Peiming added inline comments.Feb 28 2023, 11:51 AM
mlir/test/Dialect/SparseTensor/sparse_2d.mlir
1012

Simply because it is too long... I can add them back if you want

Peiming updated this revision to Diff 501252.Feb 28 2023, 12:05 PM
Peiming marked 4 inline comments as done.
Peiming edited the summary of this revision. (Show Details)

address comment and revert unrelated changes.

aartbik accepted this revision.Feb 28 2023, 12:10 PM
aartbik added inline comments.
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
1550–1555

typo remains: the eariler a constrait mask

This revision is now accepted and ready to land.Feb 28 2023, 12:10 PM
Peiming updated this revision to Diff 501256.Feb 28 2023, 12:10 PM
Peiming marked an inline comment as done.

fix typo

Peiming marked 2 inline comments as done.Feb 28 2023, 12:13 PM
This revision was landed with ongoing or failed builds.Feb 28 2023, 1:02 PM
This revision was automatically updated to reflect the committed changes.