This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] sparse tensor storage implementation
ClosedPublic

Authored by aartbik on Feb 1 2021, 9:39 PM.

Details

Summary

This revision connects the generated sparse code with an actual
sparse storage scheme, which can be initialized from a test file.
Lacking a first-class citizen SparseTensor type (with buffer),
the storage is hidden behind an opaque pointer with some "glue"
to bring the pointer back to tensor land. Rather than generating
sparse setup code for each different annotated tensor (viz. the
"pack" methods in TACO), a single "one-size-fits-all" implementation
has been added to the runtime support library. Many details and
abstractions need to be refined in the future, but this revision
allows full end-to-end integration testing and performance
benchmarking (with on one end, an annotated Lingalg
op and, on the other end, a JIT/AOT executable).

Diff Detail

Event Timeline

aartbik created this revision.Feb 1 2021, 9:39 PM
aartbik requested review of this revision.Feb 1 2021, 9:39 PM
aartbik updated this revision to Diff 320674.Feb 1 2021, 10:05 PM

removed some testing code

aartbik updated this revision to Diff 320897.Feb 2 2021, 1:32 PM

rebase + refine

bixia requested changes to this revision.Feb 2 2021, 9:06 PM
bixia added a comment.Feb 2 2021, 9:06 PM

It is a very nice PR, thanks for the great work! I finish one round of reading the code and haven't read the MLIR tests yet. Would like to send out my comments I have so far.

mlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp
524

I think the second line of the comment is not necessary, as this routine is checking whether the operand is in the form of a sparse tensor storage representation and dense content tensor certainly can be represented via sparse tensor storage.

534–535

Sorry I don't understand the last part of the comment starting from "or ...".
Is that true that what we want to say here is "the routine returns a buffer for the output and the buffer is initialized"?

578

I think we need to remove this line. Because in line 618, the way allDense is used, it should just correspond to whether the tensor is in SparseTensorStorage representation. Am I right here? If I am right here, I would recommend the consolidate the two variables, atSparse and allDense, to one variable isSparseTensor.

mlir/lib/ExecutionEngine/SparseUtils.cpp
106

implement

128

Initializes

168

pass

288–289

struct

This revision now requires changes to proceed.Feb 2 2021, 9:06 PM
aartbik updated this revision to Diff 321014.Feb 2 2021, 11:01 PM
aartbik marked 7 inline comments as done.

addressed comments

Thanks for the speedy review, Bixia!

mlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp
524

Okay, removed. It was to contrast "true" dense tensors (i.e. the ones that are marked as such in the annotations and are passed true tensor types) for "test" dense tensors (i.e. marked as dense but passed an opaque pointer that is set up with an artificial all-dense sparse storage scheme just for stress testing). But I removed the comment.

534–535

Set to your suggestion. I was trying to give too much background information here, but got lost in the woods of details.

578

Good observation, but this is a bit tricky. There are two sources for sparsity: the annotations, and the fact that we feed in a "glue" node from opaque pointer to tensor. The latter is relatively new, and is only temporary (this will go away once we have a proper sparse type). So most tests will still need the allDense to be set based on annotations, but the opague pointer needs the different DimOp. Once we have one type, this will indeed converge to one again.

ftynse added a comment.Feb 3 2021, 1:31 AM

Looks okay in general. I would like some higher-level description of the storage scheme somewhere.

Please do not commit until @nicolasvasilache had an opportunity to look at Linalg ops.

mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
525 ↗(On Diff #321014)

All these could use some examples in let description and a better specification of what the underlying assumptons are.

539 ↗(On Diff #321014)

Nit: we use arg-type to result-type for cast-like operations, better to be consistent.

552 ↗(On Diff #321014)

Are we sure this are going to be side effect-free in all cases? It seems to assume a specific underlying storage scheme for tensors, at which point it may be possible to just extract a pointer and wrap it into a memref. If the actual scheme is different, it may be allocating a memref and copying data into it...

575 ↗(On Diff #321014)

Array sounds like it is 1D.

576 ↗(On Diff #321014)

I feel like the there should be more invariants about some of these ops. Can it really convert _any_ tensor into _any_ strided memref? 4D tensor into a 2D column-major memref?

mlir/lib/Dialect/Linalg/Transforms/SparseLowering.cpp
7

Something went wrong here.

26–28

If you take ValueRange instead of ArrayRef<Value> (which you anyway should unless there is a strong reason not to), you can drop this and call operands.getTypes().

78–82

Nit: LLVM-style early return would be something like

if (!eltType.isIndex() && !eltType.isInteger(64))
  return rewriter.notifyMatchFailure("unsupported element type");
StringRef name = "sparsePtrsI64";
mlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp
526

Nit: we have isa_and_nonnull to avoid the actual cast when you don't need the result

mlir/lib/ExecutionEngine/SparseUtils.cpp
111

Nit: we prefer taking non-nullable mutable arguments by reference

132

Do we want to make this function private?

175

Maybe make the element type a template parameter of the enclosing class?

297

MemRef1DU64 plz, we may need other ranks.

bixia added inline comments.Feb 3 2021, 8:09 AM
mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
575 ↗(On Diff #321014)

I think it is 1D. See also my response to the comment below.

576 ↗(On Diff #321014)

I think the answer is YES, and the stride is trivial 1. The three Linalg_SparseTensorToXYZMemRefOp here are converting the SparseTensorStorage.positions[x]/indices[y]/values to MemRefOp as can be seen from the C APIs from the same source file.

ftynse added inline comments.Feb 3 2021, 8:36 AM
mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
575 ↗(On Diff #321014)

If it is indeed 1D, than we should say so in the result type constraint or the verifier.

576 ↗(On Diff #321014)

2D memref have offset, 2 sizes and 2 strides. What are those numbers in case of 4D tensor input? 8D tensor input? Where do they come from? I see how this can be converted to a 1D memref<?xType> with an external indexing scheme (which still requires to verify that the operand and the result types have the same element type rather than accepting _anything_).

aartbik marked 16 inline comments as done.Feb 3 2021, 10:15 AM

Thanks all for the very detailed and useful feedback!

mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
525 ↗(On Diff #321014)

You are right. These ops are probably short lived until we get a SparseType with ops but even then we can carry documentation over. Added.

552 ↗(On Diff #321014)

You are right it is safer to remove this given the opaque implementation for now. It does not matter much for performance anyway, since the sparse compiler hoists these to the top loop always.

576 ↗(On Diff #321014)

Any tensor in (since we still live in dense tensor land), but only 1D arrays out. I added that constraint so legality is verified better.

mlir/lib/Dialect/Linalg/Transforms/SparseLowering.cpp
7

Ah, good catch! Thanks.

26–28

Elegant. Thanks.

78–82

Agreed in general, but this is a bit forward looking on adding more types here (which would make the early return a bit harder)

mlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp
526

Also more elegant. Thanks.

578

On second though I don't think we needed the specialized DimOp and thus this logic. Simplified again.

mlir/lib/ExecutionEngine/SparseUtils.cpp
175

You are way ahead of me. This class will indeed be templated by value type, index type and pointer type in the future (for now I just showed how the default f64/index/index case would be lowered. But added this just to indicate the future direction.

aartbik marked 8 inline comments as done.EditedFeb 3 2021, 10:18 AM

Alex @ftynse, it would also be nice to use the memref in the input parameters, for example

void *newSparseTensor(char *filename, MemRef1DI1 ref) {
  assert(ref.stride == 1);
  return newSparseTensorC(filename, ref.base + ref.off);
}

instead of

void *newSparseTensor(char *filename, bool *abase, bool *adata, uint64_t aoff,
                      uint64_t asize, uint64_t astride) {
  assert(astride == 1);
  return newSparseTensorC(filename, abase + aoff);
}

but this seems ABI incompatible with the ways arguments are passed C style for structs.
Do you know of any clever trick to make this work? It would read easiser....

aartbik updated this revision to Diff 321179.Feb 3 2021, 11:50 AM

addressed more comments

ftynse added a comment.Feb 8 2021, 4:23 AM

Alex @ftynse, it would also be nice to use the memref in the input parameters, for example

void *newSparseTensor(char *filename, MemRef1DI1 ref) {
  assert(ref.stride == 1);
  return newSparseTensorC(filename, ref.base + ref.off);
}

instead of

void *newSparseTensor(char *filename, bool *abase, bool *adata, uint64_t aoff,
                      uint64_t asize, uint64_t astride) {
  assert(astride == 1);
  return newSparseTensorC(filename, abase + aoff);
}

but this seems ABI incompatible with the ways arguments are passed C style for structs.
Do you know of any clever trick to make this work? It would read easiser....

Add the llvm.emit_c_interface attribute to the relevant functions, this will generate _mlir_ciface_<original_name> wrappers that take a pointer to the struct, unpack it, and forward the descriptor components to the actual call. More info here https://mlir.llvm.org/docs/LLVMDialectMemRefConvention/#c-compatible-wrapper-emission

Add the llvm.emit_c_interface attribute to the relevant functions, this will generate _mlir_ciface_<original_name> wrappers that take a pointer to the struct, unpack it, and forward the descriptor components to the actual call. More info here https://mlir.llvm.org/docs/LLVMDialectMemRefConvention/#c-compatible-wrapper-emission

Ah, thanks for the pointer to a detailed description (I had seen examples of this). This wrapper nicely hides these details, but I was really hoping for some direct C magic to either let MLIR pass by struct or C to scalarize the struct :-)

Add the llvm.emit_c_interface attribute to the relevant functions, this will generate _mlir_ciface_<original_name> wrappers that take a pointer to the struct, unpack it, and forward the descriptor components to the actual call. More info here https://mlir.llvm.org/docs/LLVMDialectMemRefConvention/#c-compatible-wrapper-emission

Ah, thanks for the pointer to a detailed description (I had seen examples of this). This wrapper nicely hides these details, but I was really hoping for some direct C magic to either let MLIR pass by struct or C to scalarize the struct :-)

This is quite tricky as the way C-level types map to the LLVM level is dependent on the ABI, and all this logic is builtin clang itself. It would be a nice project to extract this from clang and make it reusable though :)

It would be a nice project to extract this from clang and make it reusable though :)

Ah, perhaps a cool summer intern starter project ;-)

Looks okay in general. I would like some higher-level description of the storage scheme somewhere.

Please do not commit until @nicolasvasilache had an opportunity to look at Linalg ops.

@nicolasvasilache since you were explicitly called out, would you mind having a look at the Linalg ops.
I suspect you are okay, since these form a short-lived glue until we have a sparse tensor type, but perhaps you have some other insights?

aartbik updated this revision to Diff 322234.Feb 8 2021, 4:03 PM

rebasing to avoid bit-rot

nicolasvasilache requested changes to this revision.Feb 9 2021, 1:26 AM

Please introduce the new ops independently with roundtrip, verifier and independent testing so their behavior is well-understood + tested and not just assumed by a single-file compiler.
The amount of tests involved here is quite scary to me: I assume it is well-understood that they will be unsupported in the future as the proper refactoring work starts?

mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
453 ↗(On Diff #322234)

Let's please isolate these ops in a new LinalgSparseOps.td for now.
Once things are integrated and compose at the level of transformations, we can pull the abstractions that make the cut in LinalgOps.td.

491 ↗(On Diff #322234)

tensor_to_memref folds away as bufferization occurs.
This does not fold away but rather gets converted to calls and ends up being executed.

At lowering time I see: `// Lower sparse primitives to calls into runtime support library.`

Please improve the doc so we understand what this does in practice.
This requires some diagrams with what data is input and what data is output.

The way we proceed in MLIR for adding such ops is to add them one by one with proper roundtrip and invalid tests + in your case execution tests.

Once the op semantics are well-understood and tested in isolation, composition becomes possible.

I am not pasting this for each op but please do treat it as a comment for all ops that actually materialize into executable IR: they need to be introduced independently with all the supporting testing / semantics.

This revision now requires changes to proceed.Feb 9 2021, 1:26 AM
aartbik marked 2 inline comments as done.Feb 9 2021, 1:13 PM

Please introduce the new ops independently with roundtrip, verifier and independent testing so their behavior is well-understood + tested and not just assumed by a single-file compiler.
The amount of tests involved here is quite scary to me: I assume it is well-understood that they will be unsupported in the future as the proper refactoring work starts?

mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
453 ↗(On Diff #322234)

I have moved this into its own file now, and added more documentation to the ops themselves (but without going to the public Linalg doc for now). I also added roundtrip and isolated lowering tests.

491 ↗(On Diff #322234)

I modified the doc and added some diagram at the top of the new file.

aartbik updated this revision to Diff 322482.Feb 9 2021, 1:13 PM
aartbik marked 2 inline comments as done.

isolated sparse linalg ops in separate file, added roundtrip and lowering tests

bixia added inline comments.Feb 9 2021, 3:24 PM
mlir/include/mlir/Dialect/Linalg/IR/LinalgSparseOps.td
22

I see and extra "em" in this word.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
1046

s/Set/Sets/

bixia accepted this revision.Feb 9 2021, 3:27 PM
aartbik marked 2 inline comments as done.Feb 9 2021, 4:22 PM

Thanks Bixia!

mlir/include/mlir/Dialect/Linalg/IR/LinalgSparseOps.td
22

Right you are! Thanks for catching.

aartbik updated this revision to Diff 322538.Feb 9 2021, 4:34 PM
aartbik marked an inline comment as done.

fixed typos

This revision is now accepted and ready to land.Feb 10 2021, 11:16 AM
bondhugula added inline comments.
mlir/lib/Dialect/Linalg/Transforms/SparseLowering.cpp
120–123

Nit: invert the condition and return failure() and drop the else.

mlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp
532

Inputs / read-only args should be passed by const reference. const CodeGen &codegen?

mlir/lib/ExecutionEngine/SparseUtils.cpp
173–175

All of these are missing doc comments - triple doc comment instead of the trailing one.

401–404

Regular // comment block.

This revision was automatically updated to reflect the committed changes.
aartbik marked 4 inline comments as done.Feb 10 2021, 12:03 PM

Thanks Nicolas and Bixia. I saw some last minute feedback came in after submitting, will address that in follow up changes.

mlir/lib/Dialect/Linalg/Transforms/SparseLowering.cpp
120–123

This was suggested earlier too, but there will be more types added later (else ifs), where this kind of flow makes more sense.

mlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp
532

I will make a pass to see where we can safely add const

mlir/lib/ExecutionEngine/SparseUtils.cpp
401–404

Ok will change