This is an archive of the discontinued LLVM Phabricator instance.

[mlir] [sparse] convenience runtime support to read Matrix Market format
ClosedPublic

Authored by aartbik on Oct 5 2020, 3:25 PM.

Details

Summary

Setting up input data for benchmarks and integration tests can be tedious in
pure MLIR. With more sparse tensor work planned, this convenience library
simplifies reading sparse matrices in the popular Matrix Market Exchange
Format (see https://math.nist.gov/MatrixMarket). Note that this library
is *not* part of core MLIR. It is merely intended as a convenience library
for benchmarking and integration testing.

Diff Detail

Event Timeline

aartbik created this revision.Oct 5 2020, 3:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2020, 3:25 PM
aartbik requested review of this revision.Oct 5 2020, 3:25 PM

I'm new to MLIR, so sorry in advance for noob MLIR comments!

mlir/integration_test/Sparse/CPU/matrix-market-example.mlir
28

To make this more generic, shouldn't a be allocated after we know m and n?

49

I think there should be a separate routine to convert sparse matrix to dense, i.e., one routine to read Matrix Market entries into 3 arrays, then another to convert these arrays to a dense matrix. If we materialize the dense matrix in the sparse matrix read routine, we wouldn't be able to read many large sparse matrices from real applications.

This would allow us to just read and store all entries in 3 arrays at once (instead of reading entry-by-entry): i[nnz], j[nnz], and val[nnz].
The official Matrix Market I/O C library can do that for us: See mm_read_unsymmetric_sparse.
This will avoid the need to check if a different file is already open as well.

70

Noob question, why isn't there a CHECK-SAME in front of ( 1, 0, 0, 1.4, 0 ) just like the four lines below?

mlir/integration_test/data/test.mtx
14

Great idea making the coordinates not sorted! This makes for a good test case for when we want to convert to CSR, etc. :)

mlir/lib/ExecutionEngine/SparseUtils.cpp
38

(If we are still going to have this routine)
Should we use the official library to read the header (mm_read_banner)? It covers all cases.

82

Should this be inline?

aartbik marked 6 inline comments as done.

Welcome to this repo, Penporn!
And thanks for the comments.

mlir/integration_test/Sparse/CPU/matrix-market-example.mlir
28

Yes, you are right. I was a bit lazy anticipating the size, but for illustration purposes, adapting to the size would be better. I changed this to read more like an example. Thanks!

49

Note that the purpose of this library is not to provide deep support for sparse computations, but merely something light weight that makes it a bit easier to set up tests and benchmarks.

I did it this way to avoid a difficult interaction between memory allocation at MLIR level and in the C library. By "coming back first" after reading the header, all allocation can be done in MLIR.

70

The -SAME part indicates that the CHECK continues at the same line as the previous CHECK.
The output for this program is:

5
5
9
( ( 1, 0, 0, 1.4, 0 ), ( 0, 2, 0, 0, 2.5 ), ( 0, 0, 3, 0, 0 ), ( 4.1, 0, 0, 4, 0 ), ( 0, 5.2, 0, 0, 5 ) )

mlir/lib/ExecutionEngine/SparseUtils.cpp
38

I strongly prefer our own very light-weight implementation over pulling in sources from elsewhere.
We can support the other formats as the need arises.

82

If you prefer, but given the static scope, the compiler could already decide to inline this based on its heuristics.

aartbik updated this revision to Diff 296514.Oct 6 2020, 12:21 PM
aartbik marked 5 inline comments as done.

modified example to illustrate usage better

penpornk accepted this revision.Oct 6 2020, 1:12 PM

Thank you for the clarifications!

This revision is now accepted and ready to land.Oct 6 2020, 1:12 PM

Thank *you* for the review, I feel it indeed has made the example more clear!

Very cool @aartbik , nice use of MLIR / C interop to avoid hardcoding stuff in IR prematurely.

mlir/integration_test/Sparse/CPU/matrix-market-example.mlir
59

This is a nice minimal first step to think about for adding sparse functionality to Linalg.

There are a few things going on that would be nice to separate:

  1. indirection for sparse read accesses
  2. dense writes (i.e. no extra indirection)
  3. dynamic collapsing logic between k loop over nnz and "i, j loops".

Let's chat in our next VC.

mehdi_amini added inline comments.Feb 10 2021, 4:49 PM
mlir/lib/ExecutionEngine/SparseUtils.cpp
43

This whole API is forcing us into using files, while we ought to work with a buffer abstraction in general (i.e. we can get a file or a buffer from another source, but the API should abstract it).

Can you refactor this and use something like llvm::MemoryBuffer for example?

aartbik marked an inline comment as done.Feb 10 2021, 7:04 PM
aartbik added inline comments.
mlir/lib/ExecutionEngine/SparseUtils.cpp
43

I like this idea (did not know about the built-in mem buffer support).
This will also make injecting test files a bit easier than currently is the case.
Thanks!

Coming up in next sparse utils enhancements....