This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] Distinguishing "shape" from "sizes" in variable names
ClosedPublic

Authored by wrengr on Mar 18 2022, 8:04 PM.

Details

Summary

I'm using "shape" to mean the compile-time object, where zeros indicate sizes which are compile-time dynamic; and using "sizes" to mean the run-time object, where zeros indicate a dimension with no coordinates (hence resulting in trivial storage). Because their semantics differ on zeros, it's important to keep them distinguished. Although we do not define separate C++ types to capture the distinction, we can at least use variable names to do so.

This is (tangential) work towards fixing: https://github.com/llvm/llvm-project/issues/51652

Depends On D122057

Diff Detail

Event Timeline

wrengr created this revision.Mar 18 2022, 8:04 PM
wrengr requested review of this revision.Mar 18 2022, 8:04 PM

I agree with the general direction, but am not so sure on size vs shape here.
Can we come up with something that keeps "size" for both cases, but makes the distinction.
Because in some case we compare one "size" with the other "size".
E.g. dynSize and staticSize but I am open for better stuff.

I agree with the general direction, but am not so sure on size vs shape here.
Can we come up with something that keeps "size" for both cases, but makes the distinction.
Because in some case we compare one "size" with the other "size".
E.g. dynSize and staticSize but I am open for better stuff.

I just used "shape" since that's what MLIR uses elsewhere for its type-system's notion of the complete collection of compile-time sizes which are potentially of "unknown" (?) value. But I'm cool with other names. As for what names to use, I have a number of thoughts/concerns but can't quite organize them at the moment. So with no particular order nor organization:

(1) I'm fine with the "dyn" prefix. However, I think it's not entirely clear which one it should refer to: on the one hand, it could mean the run-time sizes of the SparseTensorStorage itself, since they are compile-time-dynamic entities; on the other hand, it could mean MLIR's type-system notion of shapes which may contain "unknown" (?) sizes. I haven't noticed the MLIR documentation being particularly clear/consistent about whether "dynamic shape" means the compile-time shape with unknown-sizes vs the run-time sizes which instantiate that compile-time shape; but if they have an official stance on the jargon then we should be sure to follow it.

(2) I don't like the "static" prefix: since in C++ "static" means compile-time-fixed, which doesn't cleanly apply to either of the things here: the run-time sizes aren't compile-time-fixed, obviously; and although MLIR's type-system shapes are compile-time entities, it's a bit peculiar to refer to a ranked shape with unknown sizes as "static" imo. I also dislike the prefix because "static" is terribly overloaded in C++.

(3) Once D122061 lands we will also want to make the distinction between dimension-sizes vs assembled-sizes (where the latter term is taken from TACO). Prior to that differential we do already have assembled-sizes, it's just that we don't really care very much about them so there's no pressing need to give them their own naming convention. Assembled-sizes are always compile-time-dynamic (since they're not captured by the MLIR type system nor by the C++ type system), and the only thing we ever compare them against is themselves (i.e., to ensure they haven't changed when mutating something else), so there's no analogue of the "shape"-vs-"sizes" situation for assembled-sizes.

aartbik accepted this revision.Mar 21 2022, 3:34 PM

Looking more carefully again, having the sizes array (plural) vs shape (singular, i.e. a tuple of sizes) actually makes a lot of sense to make the distinction.
So this is fine as is!

This revision is now accepted and ready to land.Mar 21 2022, 3:34 PM