This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] introduce sparse_tensor::StorageSpecifierToLLVM pass
ClosedPublic

Authored by Peiming on Dec 15 2022, 10:45 AM.

Diff Detail

Event Timeline

Peiming created this revision.Dec 15 2022, 10:45 AM
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
Peiming requested review of this revision.Dec 15 2022, 10:45 AM
Peiming updated this revision to Diff 483244.Dec 15 2022, 11:01 AM

add comments.

Peiming updated this revision to Diff 483246.Dec 15 2022, 11:03 AM

add check tests.

Peiming added inline comments.Dec 15 2022, 11:50 AM
mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorBuilder.h
157 ↗(On Diff #483246)

Note that there are currently two SparseTensorDescriptor in this patch, this one is the updated one and will be kept, the other one will be deleted in the next patch.

This is because in this patch, the storage_specifier related operations already relies on the update memory scheme, while all other operations do not. So I include both here just to avoid merge conflict.

This patch and the next patch will be squashed and then pushed, so this wired duplication won't appear on upstream git history.

wrengr added inline comments.Dec 15 2022, 5:13 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorBuilder.h
157 ↗(On Diff #483246)

So long as this patch compiles fine, I'd suggest just landing it as-is (after approval) rather than squashing it with D140130. That makes things a bit easier to track when trying to go backwards from the github commits to the phabricator reviews.

wrengr added inline comments.Dec 15 2022, 5:28 PM
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h
148–150 ↗(On Diff #483392)

This comment isn't valid/necessary. (1) We already make sure to only use the last two bits for the properties. (2) The preceding documentation said that the actual bitpatterns should not be relied on, so the fact that we put the properties in the last bits rather than somewhere else is also something that shouldn't be specified in the documentation. (3) There is no isSameDLT function, and again the &~3 part shouldn't be in the documentation since the actual bitpatterns shouldn't be relied on.

If you meant for this to be addressing my earlier comments about validity concerns with the return type of stripLevelProperties, then I would instead have the comment read "NOTE: Future extensions should ensure that isValidDLT(stripLevelProperties(dlt)) == true remains valid." Of course, whether we actually want to promise that or not is a design decision that's still waiting for @aartbik to chime in on.

wrengr added inline comments.Dec 15 2022, 5:33 PM
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h
185 ↗(On Diff #483392)

I mentioned some validity concerns about using this return type, though it was on a different CL. Can you rebase the current CL to be on top of that one, to avoid duplicate reviewing effort?

wrengr added inline comments.Dec 15 2022, 5:54 PM
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h
185 ↗(On Diff #483392)

(unless this was the CL in question and phabricator just lost the earlier comments for some reason, of course)

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorTypes.td
57

Why this change? Do we not want these builders to support the getChecked variants?

74

I don't recall the details at the moment, but I seem to recall there's a different field you can set to have it generate the headers but not the bodies for the default builders. That would allow you to avoid adding the get method to the extraClassDeclaration, as well as enabling the getChecked variants of things (if/when we add a verifier).

mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td
305

Should say "LLVM code" rather than "actual code"

mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
338

You should probably prune out identity mappings here too.

Actually, the code for pruning out identity mappings should probably be moved to the SparseTensorEncodingAttr builders, so that we do this uniformly everywhere.

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorBuilder.h
25 ↗(On Diff #483392)

I'm not sure what you mean by this comment, but "detail" is the standard name for the namespace holding internal bits that need to be in the header but should't be considered part of the public API

55 ↗(On Diff #483392)

why uint32_t instead of something else like uint8_t or leaving the repr unspecified?

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp
198–199

Not relevant to this code, but I only thought of it here: why are these ops named {Get,Set}StorageSpecifierOp rather than StorageSpecifier{Get,Set}Op? The latter seems more consistent with StorageSpecifierInitOp as well as being consistent with the storage_specifier.{get,set} MLIR name

Peiming added inline comments.Dec 16 2022, 9:36 AM
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h
185 ↗(On Diff #483392)

I abandoned that one and splitted it into two (with this one being one of these two)

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorTypes.td
74

This is to ensure every time you calls it, it will normalized the encoding. (the default build simply takes the encoding).

This also answered your previous quesion $_get => Base::get, get => our customized builder

aartbik added inline comments.Dec 16 2022, 10:46 AM
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h
148 ↗(On Diff #483392)

I agree we need to unify the documentation of the enum on how properties are encoded.
I would simply change the text to state that the last "x" bits are used for encoding properties
(rather than stating it in the negative that it will break when that changes ;-)

Also, perhaps we want to reserve a few more bits, since I expect we will get more properties
in the future (but that is probably more a question for Wren than for this review)

184 ↗(On Diff #483392)

Note that "stripping" is not really quite right. By resetting the bits, you basically make the property ordered and unique (since that is the zero encoding). I think it would also help if you explain the purpose of this function (compare two without properties? make them ordered/unique?)

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorTypes.td
74

In that case, we should really document these subtleties here, since it seems a detail that is easy to forget when reading this code in the future again ;-)

mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h
202

since this is a pass with type rewriter, I would give it its own "header" rather than stuffing it in the "other" section (in fact, I want to remove this "other" altogether and give each its own header for clarity)

mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td
305

Also "rewrite sparse primitives" is overly generic. How about just saying

"Lowers the sparse specifier to LLVM strutures"

or something of that nature?

mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
327

Since this is a local function, you will need to document this. Also "normalized" has some meaning but it is also vague enough that it helps to see what the indented usage is.

And like Wren also said, this feels like something that needs to happen once while building and then always guaranteeing that property going forward.

mlir/lib/Dialect/SparseTensor/Transforms/SparseSpecifierToLLVM.cpp
1 ↗(On Diff #483392)

Writing yet another vectorizer? ;-) ;-)

48 ↗(On Diff #483392)

I have a very hard time figuring out what value is here? Help me!

71 ↗(On Diff #483392)

no line break

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorBuilder.h
1 ↗(On Diff #483392)

I feel we need to bikeshed a bit more on the name. SparseTensorBuilder is accurate, but also misleading (we are really talking about the layout in memory here).

Following the classnames, I would like to see something with Layout or Specifier.

9 ↗(On Diff #483392)

lowering and accessing?

25 ↗(On Diff #483392)

I think you need to make it very clear this is a literal copy of what is in CodegenUtils right now and your future plans on how to mitigate the codedup!

Peiming added inline comments.Dec 16 2022, 10:55 AM
mlir/lib/Dialect/SparseTensor/Transforms/SparseSpecifierToLLVM.cpp
48 ↗(On Diff #483392)

This is inherited from StructBuilder, which keeps the current SSA value for the llvmStruct.

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorBuilder.h
55 ↗(On Diff #483392)

There is no I8Attr in TableGen, just want to make sure it has the same type with SpecifierEnum (which is generated with tbgen), I can change it to u8.

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp
198–199

Hmmm, make sense, I will change it in next revision.

wrengr added inline comments.Dec 16 2022, 12:00 PM
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h
148 ↗(On Diff #483392)

Since the preceding documentation says clients shouldn't rely on the specific bitpatterns, we can always rearrange things in the future. So long as clients actually adhere to that API, then nothing should break whenever we make changes. (Alas, the CHECK tests are bad clients, so they will need to be updated every time we change things.)

Fwiw, it was an intentional decision to specifically exclude the bitpatterns from the API. The current encoding is just the simplest thing imaginable, so I anticipate that as we update the set of supported DLTs we may well discover some better encodings. As such, I wanted to make sure the enum gives us the flexibility to switch to better encodings as we discover them.

(I didn't have any particular new encodings in mind when I wrote the enum, but recently a few possibilities have started to creep up on the horizon. First example: once the ITPACK support goes in we'll have four encodings which capture all the possibilities of whether or not they have a pointer array or an index array (dense = 00, singleton = 01, counted = 10, compressed = 11). So it could be helpful to adjust the numbers for those DLTs to capture those bitpatterns. Second example: once we start doing the partial-AOS transformation, it may make sense to have the DLT for the COO-levels include a field for storing the count of how many singleton levels are combined into the compressed level.)

Peiming added inline comments.Dec 16 2022, 3:18 PM
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h
148–150 ↗(On Diff #483392)
Peiming updated this revision to Diff 484628.Dec 21 2022, 11:11 AM
Peiming marked 21 inline comments as done.

address comments.

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorTypes.td
57

It does not invalidate getChecked.

mlir/lib/Dialect/SparseTensor/Transforms/SparseSpecifierToLLVM.cpp
1 ↗(On Diff #483392)

I should definitely configure my editor to insert the header instead of copying it by myself ;-)

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorBuilder.h
1 ↗(On Diff #483392)

Changed to SparseTensorStorageLayout.cpp/h

25 ↗(On Diff #483392)

This namespace will existing for only this patch and will soon be removed later. It is here simply to prevent name collision. I think that it does not really matter what name I picked here.

wrengr added inline comments.Dec 21 2022, 3:52 PM
mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorTypes.td
74

yeah. I thought $_get got substituted with either get or getChecked from the current class, rather than from the base class.

aartbik added inline comments.Dec 22 2022, 10:55 AM
mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorTypes.td
56–57

This was already there, but should we not put this in the validation rather than asserting during build (just so that we get a more consistent message to the user?)

mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h
162

StorageSpecifierToLLVM seems a bit better name (together with filename change for the pass)

mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td
304

sparse-storage-specifier-to-llvm?

Note that here I would keep the sparse- prefix since all our flags have sparse in them one way or the other

307

rewrites

308

and converts

a llvm.struct (not an)

mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
331

yeah, this has become very reasonable now

note that it is still possible that we change our mind on what is "normalized" but now we have at least very clear semantics on what we do with the properties

we can bikeshed on what is normalized later, but at least our type system is very clean now (format/properties)

aartbik added inline comments.Dec 22 2022, 11:07 AM
mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td
308

I was wrong. I see on the website "Writing an LLVM Pass" :-)

Peiming updated this revision to Diff 484905.Dec 22 2022, 11:16 AM
Peiming marked 7 inline comments as done.

address comments.

Peiming marked an inline comment as done.Dec 22 2022, 11:16 AM
Peiming added inline comments.
mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorTypes.td
56–57

Yeah, deleted.

aartbik added inline comments.Dec 22 2022, 12:18 PM
mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h
162

You need to update the comment too

SparseSpecifierToLLVM -> StorageSpecifierToLLVM

mlir/lib/Dialect/SparseTensor/Transforms/SparseSpecifierToLLVM.cpp
1 ↗(On Diff #484905)

move file and comment to StorageSpecifierToLLVM.cpp ?

181 ↗(On Diff #484905)

StorageSpecifierXXXOp ?

aartbik accepted this revision.Dec 22 2022, 12:19 PM

LGTM after you make the edits

This revision is now accepted and ready to land.Dec 22 2022, 12:19 PM
Peiming updated this revision to Diff 484927.Dec 22 2022, 12:29 PM
Peiming marked 2 inline comments as done.

address comments.