SparseTensor::StorageSpecifierType used to take SparseTensorEncodingAttr to calculate the memory layout. Yet, SparseTensorEncodingAttr holds extra information that are not needed by storage specifier (e.g., dimension level properties, dimension ordering, etc).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h | ||
---|---|---|
160–161 | I think it's clearer to say that the enum defines all the "storage formats/representations themselves, without any level properties." | |
163 | Fwiw, now that I've introduced the "dim"-vs-"lvl" terminology and gotten the bulk of the codebase rewritten to make that distinction, I'm planning to rename DimLevelType to just LevelType in the near future. And the name I've had in mind for the propertyless-DLT enum has been LevelFormat (mnemonic: "type = format + properties"). So if you want to minimize future code churn, you should name this LevelFormat. Though if you'd rather wait until I do the s/DimLevelType/LevelType/g change, then I think StorageLevelType is a fine name for the interim. | |
171 | Could you add some static_asserts that the DLT and SLT versions of this function agree? Or file a ticket to do so, assign it to me (so I don't forget), and add a TODO with the ticket number? | |
260 | This name is too generic, since we could in principle construct a bunch of different things out of a DLT. I'd suggest something more like getStorageLevelType(/getLevelFormat). | |
266–267 | I think it'd be better to just static_assert(std::is_same_v<std::underlying_type_t<StorageLevelType>, std::underlying_type_t<DimLevelType>>); That's enough to make sure their underlying types agree, but minimizes the number of places we'd need to change things if we decide we need to change the underlying type to something wider. You should also add a sentence to the documentation explaining that this is a simplification to help ensure efficiency or whatever. (E.g., in the long run if we end up expanding the LevelType out to a larger struct, or add a ton of new properties, then it'd no longer make sense to require the LevelFormat have the same bitwidth.) My main concern here is that this is a rather different sort of assertion than the others, which are strictly about ensuring that the bitbashing is correct. | |
271–276 | I'm not sure I like these assertions... The next block of assertions that fromDLT does what we expect, those are great. But these assertions that fromDLT is bitwise identity on the ordered+unique DLTs, these smell fishy to me. For example, we could instead define SLT fromDLT(DLT x) { return static_cast<SLT>(static_cast<uint8_t>(x) >> 2); } (and update the enum class SLT accordingly). Why should the assertions rule that out? If there's any code that relies on fromDLT being bitwise identity on ordered+unique arguments, I expect that code is going to be depending on the underlying bit patterns of DLT in ways that it shouldn't. In fact, I'd go so far as to say that the x >> 2 implementation should be the preferred one, precisely because it will catch out client code that buggily depends on the underlying bit patterns. | |
280–289 | I'm a bit surprised that this compiles as written, since you haven't defined operator== on StorageLevelType. On a quick look I couldn't find anything in the spec that defines how default operator== works on enum class, so I'm not sure how portable this'll be; though it was just a quick look, so I may well have missed it. (Personally, as a language designer, I'd think the default operator== should be fine here since it doesn't do anything to break the encapsulation that enum class provides. But just because I think it's reasonable doesn't mean the C++ standards committee does ;) |
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h | ||
---|---|---|
171 | I think string comparison can not be a constexpr until C++20 |
mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorAttrDefs.td | ||
---|---|---|
31 | "to encode" or "for encoding" | |
31 | "about" | |
31 | Since the driving motivation for factoring out StorageLevelType and SparseStorageEncodingAttr is to remove the "properties" of DLT, you should avoid using that word here since it's liable to cause confusion. So perhaps something more like, "An attribute for encoding TACO-style information about a storage format for represent ing sparse tensors, without the additional semantic properties encoded by SparseTensorEncodingAttr." | |
40 | should be "storage level type" (or "level format") | |
40 | should be "level" | |
43 | "per level storage type" (or "per level storage format") | |
44 | This should be plural. | |
53–54 | Both of those should be "SparseTensorEncodingAttr" (since just saying "TensorEncodingAttr" suggests that it might be some more general thing that dense tensors could also have) | |
mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp | ||
48 | Is there any way you can get this to share more code with SparseTensorEncodingAttr::parse? It'll be a nightmare to try to keep two copies of almost identical parsers in sync | |
58 | the name dlt doesn't match StorageLevelType | |
77 | For DLT the undef value is purely for internal use (e.g., when we need to preinitialize an array of DLTs), and hence "undef" is intentionally not a valid MLIR token, since MLIR code should never construct the undef value. Do you really mean for SLT::Undef to be something that MLIR code can construct? | |
77–84 | You can reduce maintenance burden by instead having something more like: for (auto x : allSLTs) { if (strVal == toMLIRString(x)) { dlt.push_back(x); break; }}. That way we don't have to worry about keeping two sets of string literals in sync. | |
488 | I think this is a perfectly good place to use auto. Also the variable name dlts is no longer appropriate with the new type |
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h | ||
---|---|---|
271–276 | Okay, I will delete it. It is somehow overlapped with the tests below too. | |
mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp | ||
77 | This is a good question, I don't think undef should be in STEA in the first place (You can use the DLT internally, but it can not and should not be parsed in). Since you raised the question, I would suggest we fix it when parsing STEA as well (And verify fails when there is an undef DLT in the STEA) |
mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp | ||
---|---|---|
77 | Oh, shot! I did not realize undef is not a branch when parsing STEA... But verification failure is still something we can add ;-) |
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h | ||
---|---|---|
171 | Ah, yes, quite right. | |
mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp | ||
77 | I think it's better to fail during the parsing step rather than during the subsequent validation. Failing earlier typically makes it easier to give more relevant/targeted error messages, plus it reduces the cost/need for subsequent validation. | |
77–84 | This implementation also makes it easy to fail-to-parse-undef, just don't have SLT::Undef in the allSLTs list you're looping over :) |
mlir/include/mlir/Dialect/SparseTensor/IR/Enums.h | ||
---|---|---|
162 | I would mention the fact that this focuses on the "format" and not the "properties" (see comment above) | |
269 | Ensure would be more consistent with others (I actually prefer the Ensures, but looking at other style in the file, I would go with majority) | |
mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorAttrDefs.td | ||
26 | , -> . | |
34 | Read more in .... Or, For more information, see .... | |
194 | Can we derive from the storage class to avoid duplicating these fields? | |
mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp | ||
79 | aren't we duplicating a lot of very similar code now for the printing/parsing? |
I think it's clearer to say that the enum defines all the "storage formats/representations themselves, without any level properties."