This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] Cleaning up names in {Merger,LoopEmitter,CodegenEnv}.{h,cpp}
ClosedPublic

Authored by wrengr on Mar 9 2023, 5:01 PM.

Details

Summary

This change does a bunch of renaming to clear up confusions in these files. In particular, this change:

  • Renames variables and methods to clarify the "dim"/"lvl" distinction, and changes them to use the Dimension/Level types as appropriate.
  • Introduces new typedefs
    • ExprId, LatPointId, LatSetId: to clarify the interning design of the Merger.
    • LoopId, LoopOrd: to clarify the distinction between arbitrary names for loop-variables, vs numeric identifiers based on the actual order of loop generation.
    • TensorId
    • (Future CLs will change these from typedefs to structs/classes, so that the typechecker can help avoid mixups.)
  • Updates documentation to match the new terminology
  • Adds additional assertions
  • Adds const to local variables along the way

Diff Detail

Event Timeline

wrengr created this revision.Mar 9 2023, 5:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2023, 5:01 PM
wrengr requested review of this revision.Mar 9 2023, 5:01 PM
wrengr added a comment.Mar 9 2023, 5:42 PM

This version of the patch is based off an old upstream revision, hence all the comments about having backported the changes from D145532. I am currently in the process of rebasing the patch to incorporate D145532 properly, and will upload the new version once it finishes compiling locally (the be sure I didn't introduce any rebasing bugs).

wrengr updated this revision to Diff 504002.Mar 9 2023, 6:35 PM

Rebasing.

This rebase causes Integration/Dialect/SparseTensor/CPU/reshape_dot.mlir to fail, whereas it was working before. The immediate cause is a failed assertion about dstLvl in LoopEmitter::getCollapseReassociation. I'm still re-debugging that, but am uploading this version for reviewers to get started on.

Peiming added inline comments.Mar 10 2023, 9:24 AM
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
223

Yeah, probably we should.

225

Initialize*s* (finally my turn ;-))

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
306

I won't spend too much effort on this because

  1. this is a temporary workaround.
  2. In the most cases, the reassoc is empty, and each level is queried just once.
wrengr added inline comments.Mar 10 2023, 11:40 AM
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
225

Fwiw, the style Aart has been pushing is: (1) for documentation on functions/methods, use the indicative ("-s"); but (2) for commentary on statements, use the imperative ("-{}")

;)

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
306

Even if it's queried just once, still makes sense to me to compute it when we store it. (The exception would be if it's expensive to compute (which it isn't really), or if a bunch of them are stored but never queried.)

If it's just a temporary workaround, what's the longer-term solution? Cuz the workaround is pretty invasive— conceptually speaking (since it means we need to be sure to distinguish/clarify srcLvl-vs-dstLvl in a lot of places); not operationally speaking. Fwiw, if by the workaround-ness you mean our goal to split off the "view" stuff from the "real tensor" stuff, then I'm not entirely sure if that'd help the conceptual problems here. Even if/when we have separate MLIR-types for views vs tensors, we'll still need to handle the combinations together (e.g., if someone does a (tensor,view)-matmul then the shared dimension-axis will need to iterate over the corresponding source-tensor levels and then do the appropriate filtering to combine them).

In any case, the main goal for this CL is just to rename things (and for this method, to move it to the appropriate section rather than having it interleaved with the data members).

wrengr marked an inline comment as done.Mar 10 2023, 11:44 AM

This rebase causes Integration/Dialect/SparseTensor/CPU/reshape_dot.mlir to fail, whereas it was working before. The immediate cause is a failed assertion about dstLvl in LoopEmitter::getCollapseReassociation. I'm still re-debugging that, but am uploading this version for reviewers to get started on.

Hmm, everything passed on the buildbot. I wonder how my local repo differs...

wrengr updated this revision to Diff 504237.Mar 10 2023, 11:45 AM

resolving todo re reserving room for LoopEmitter::loopSeqStack

Peiming added inline comments.Mar 10 2023, 12:22 PM
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
486

You should use dstLvl here

Peiming added inline comments.Mar 10 2023, 12:24 PM
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
306

The ulimate solution is to have a sparse tensor view, (or whatever you prefer) and a set of operations on top them.

wrengr updated this revision to Diff 504258.Mar 10 2023, 12:34 PM
wrengr marked an inline comment as done.

Addressing comment to fix bug in LoopEmitter::enterLoopOverTensorAtLvl

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
486

Thanks :)

wrengr updated this revision to Diff 504307.Mar 10 2023, 5:01 PM

Rebased over D141532

wrengr updated this revision to Diff 504316.Mar 10 2023, 5:58 PM

Resolving a todo in Merger.cpp

wrengr updated this revision to Diff 504319.Mar 10 2023, 6:19 PM

Improving documentation for Merger::setHasSparseOut

are we really going to follow up on all the TODO's and notes?
is it really worth putting them *all* in our codebase?

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
33

You are adding *a lot* of comments with TODOs that read more like notes to self on how to improve the code. It makes the actual code actually a bit harder to read, since each block of comments looks like detailed documentation, but really is not

220

I really like the n, e, b short convention, but not it becomes very spelled out and breaks the lines

497–501

This was documented at some point, but perhaps the convention was lost.
The vector as used for anything with fixed a priori size, the others for stuff that had variable length (but not too much)

are we really going to follow up on all the TODO's and notes?
is it really worth putting them *all* in our codebase?

Some of the todos are really questions to the team (e.g., the std::vector vs SmallVector discrepancy, and explicating how LoopId/LoopOrd interact with AffineDimExpr/LinalgIndexOp, cleaning up the "idx"/"ldx" names). And the fixmes are things I'd love to resolve, but I couldn't figure out what they were supposed to be from context, so those're also for the folks more familiar with those particular parts of the codebase.

But yes, I was planning to follow up on the others (making Kind nested/enumclass, encoding arity into the Kind, making the aliases into structs, using AOS for LoopEmitter). I can remove them if you'd prefer

wrengr updated this revision to Diff 504788.Mar 13 2023, 12:02 PM
wrengr marked an inline comment as done.

updating comment/todo re std::vector vs llvm::SmallVector

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
220

I don't quite follow what you mean here

497–501

Aha. I'll convert the todo to a comment explaining that

Though I do worry about the fact that operator[] has different semantics for the two types; since the type is not evident at the callsites, and thus it's not immediately apparent whether we need to assert against OOB or not.

wrengr updated this revision to Diff 504794.Mar 13 2023, 12:22 PM

Doing a few more renames:

  • Merger
    • conjLatPoint -> conjLat
    • takeConj -> conjSet
    • takeDisj -> disjSet
    • takeCombi -> combiSet
  • CodegenEnv
    • getTensorExp -> getExprId

The Merger methods are renamed to follow the "lat"/"set" naming convention used by the other methods. The CodegenEnv method is renamed because the return type is ExprId rather than TensorExp.

wrengr updated this revision to Diff 504795.Mar 13 2023, 12:23 PM

git-clang-format

wrengr updated this revision to Diff 504821.Mar 13 2023, 1:13 PM

Marking Merger::buildExp const

aartbik accepted this revision.Mar 14 2023, 10:30 AM
aartbik added inline comments.
mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
220

Oh, I meant that using n,e,b as parameter name was a convention in this file and gave rise to very short lines for the methods, like constructor and such.

But in general, spelling things out, like you do, is better, so this was more me thinking out loud ;-)

This revision is now accepted and ready to land.Mar 14 2023, 10:30 AM
Peiming added inline comments.Mar 14 2023, 10:34 AM
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
43

I think unsigned is good enough here if you interpret it as the index to loop-stack

wrengr updated this revision to Diff 505198.Mar 14 2023, 11:42 AM
wrengr marked 4 inline comments as done.

Changing LoopOrd to use unsigned

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
220

Ah, gotcha. I was mostly just trying to stick with the preexisting style re single- vs triple-letter names in all these files (just introducing n for LoopOrd to keep it distinct from i for LoopId); in part to minimize the diff, and in part to avoid dying on any hills

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
43

The previous code seemed a bit inconsistent about whether to use size_t vs unsigned, so I just picked the one that seemed a bit more common. Easy enough to switch.

This revision was landed with ongoing or failed builds.Mar 14 2023, 11:51 AM
This revision was automatically updated to reflect the committed changes.