This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] use SparseStorageEncodingAttr for sparse_tensor.storage_specifier
AbandonedPublic

Authored by Peiming on Dec 16 2022, 3:07 PM.

Details

Summary

SparseTensor::StorageSpecifierType used to take SparseTensorEncodingAttr to calculate the memory layout. Yet, SparseTensorEncodingAttr holds extra information that are not needed by storage specifier (e.g., dimension level properties, dimension ordering, etc).

Diff Detail

Event Timeline

Peiming created this revision.Dec 16 2022, 3:07 PM
Herald added a project: Restricted Project. · View Herald Transcript
Peiming requested review of this revision.Dec 16 2022, 3:07 PM
Peiming edited the summary of this revision. (Show Details)
Peiming updated this revision to Diff 483680.Dec 16 2022, 3:09 PM

fix typo.

Peiming updated this revision to Diff 483684.Dec 16 2022, 3:13 PM

revert unintended change.

wrengr added inline comments.Dec 16 2022, 7:05 PM
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h
160–161

I think it's clearer to say that the enum defines all the "storage formats/representations themselves, without any level properties."

163

Fwiw, now that I've introduced the "dim"-vs-"lvl" terminology and gotten the bulk of the codebase rewritten to make that distinction, I'm planning to rename DimLevelType to just LevelType in the near future. And the name I've had in mind for the propertyless-DLT enum has been LevelFormat (mnemonic: "type = format + properties").

So if you want to minimize future code churn, you should name this LevelFormat. Though if you'd rather wait until I do the s/DimLevelType/LevelType/g change, then I think StorageLevelType is a fine name for the interim.

171

Could you add some static_asserts that the DLT and SLT versions of this function agree? Or file a ticket to do so, assign it to me (so I don't forget), and add a TODO with the ticket number?

260

This name is too generic, since we could in principle construct a bunch of different things out of a DLT. I'd suggest something more like getStorageLevelType(/getLevelFormat).

266–267

I think it'd be better to just static_assert(std::is_same_v<std::underlying_type_t<StorageLevelType>, std::underlying_type_t<DimLevelType>>); That's enough to make sure their underlying types agree, but minimizes the number of places we'd need to change things if we decide we need to change the underlying type to something wider.

You should also add a sentence to the documentation explaining that this is a simplification to help ensure efficiency or whatever. (E.g., in the long run if we end up expanding the LevelType out to a larger struct, or add a ton of new properties, then it'd no longer make sense to require the LevelFormat have the same bitwidth.) My main concern here is that this is a rather different sort of assertion than the others, which are strictly about ensuring that the bitbashing is correct.

271–276

I'm not sure I like these assertions... The next block of assertions that fromDLT does what we expect, those are great. But these assertions that fromDLT is bitwise identity on the ordered+unique DLTs, these smell fishy to me.

For example, we could instead define SLT fromDLT(DLT x) { return static_cast<SLT>(static_cast<uint8_t>(x) >> 2); } (and update the enum class SLT accordingly). Why should the assertions rule that out? If there's any code that relies on fromDLT being bitwise identity on ordered+unique arguments, I expect that code is going to be depending on the underlying bit patterns of DLT in ways that it shouldn't.

In fact, I'd go so far as to say that the x >> 2 implementation should be the preferred one, precisely because it will catch out client code that buggily depends on the underlying bit patterns.

280–289

I'm a bit surprised that this compiles as written, since you haven't defined operator== on StorageLevelType. On a quick look I couldn't find anything in the spec that defines how default operator== works on enum class, so I'm not sure how portable this'll be; though it was just a quick look, so I may well have missed it.

(Personally, as a language designer, I'd think the default operator== should be fine here since it doesn't do anything to break the encapsulation that enum class provides. But just because I think it's reasonable doesn't mean the C++ standards committee does ;)

Peiming added inline comments.Dec 16 2022, 7:57 PM
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h
171

I think string comparison can not be a constexpr until C++20

wrengr added inline comments.Dec 16 2022, 8:06 PM
mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorAttrDefs.td
31

"to encode" or "for encoding"

31

"about"

31

Since the driving motivation for factoring out StorageLevelType and SparseStorageEncodingAttr is to remove the "properties" of DLT, you should avoid using that word here since it's liable to cause confusion.

So perhaps something more like, "An attribute for encoding TACO-style information about a storage format for represent ing sparse tensors, without the additional semantic properties encoded by SparseTensorEncodingAttr."

40

should be "storage level type" (or "level format")

40

should be "level"

43

"per level storage type" (or "per level storage format")

44

This should be plural.

53–54

Both of those should be "SparseTensorEncodingAttr" (since just saying "TensorEncodingAttr" suggests that it might be some more general thing that dense tensors could also have)

mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
48

Is there any way you can get this to share more code with SparseTensorEncodingAttr::parse? It'll be a nightmare to try to keep two copies of almost identical parsers in sync

58

the name dlt doesn't match StorageLevelType

77

For DLT the undef value is purely for internal use (e.g., when we need to preinitialize an array of DLTs), and hence "undef" is intentionally not a valid MLIR token, since MLIR code should never construct the undef value.

Do you really mean for SLT::Undef to be something that MLIR code can construct?

77–84

You can reduce maintenance burden by instead having something more like: for (auto x : allSLTs) { if (strVal == toMLIRString(x)) { dlt.push_back(x); break; }}. That way we don't have to worry about keeping two sets of string literals in sync.

488

I think this is a perfectly good place to use auto. Also the variable name dlts is no longer appropriate with the new type

Peiming added inline comments.Dec 16 2022, 9:09 PM
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h
271–276

Okay, I will delete it. It is somehow overlapped with the tests below too.

mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
77

This is a good question, I don't think undef should be in STEA in the first place (You can use the DLT internally, but it can not and should not be parsed in). Since you raised the question, I would suggest we fix it when parsing STEA as well (And verify fails when there is an undef DLT in the STEA)

Peiming added inline comments.Dec 16 2022, 9:16 PM
mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
77

Oh, shot! I did not realize undef is not a branch when parsing STEA... But verification failure is still something we can add ;-)

wrengr added inline comments.Dec 19 2022, 11:52 AM
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h
171

Ah, yes, quite right.

mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
77

I think it's better to fail during the parsing step rather than during the subsequent validation. Failing earlier typically makes it easier to give more relevant/targeted error messages, plus it reduces the cost/need for subsequent validation.

77–84

This implementation also makes it easy to fail-to-parse-undef, just don't have SLT::Undef in the allSLTs list you're looping over :)

aartbik added inline comments.Dec 19 2022, 8:58 PM
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h
162

I would mention the fact that this focuses on the "format" and not the "properties" (see comment above)

269

Ensure would be more consistent with others (I actually prefer the Ensures, but looking at other style in the file, I would go with majority)

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorAttrDefs.td
26

, -> .

34

Read more in ....

Or,

For more information, see ....

194

Can we derive from the storage class to avoid duplicating these fields?
Or is that too much OO in tablegen. Also, it probably would require factoring out the "format" part....

mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
79

aren't we duplicating a lot of very similar code now for the printing/parsing?

Peiming abandoned this revision.Dec 20 2022, 3:53 PM