This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] extend loop emitter to emit slice driven loops

Authored by Peiming on Jan 30 2023, 1:28 PM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

code cleanup.

Peiming updated this revision to Diff 506697.Mar 20 2023, 1:38 PM

fix check tests.

Peiming updated this revision to Diff 506778.Mar 20 2023, 4:52 PM

code cleanup.

Peiming updated this revision to Diff 507081.Mar 21 2023, 12:16 PM

remove useless code.

Peiming updated this revision to Diff 507176.Mar 21 2023, 4:28 PM

address comments.

Peiming updated this revision to Diff 507478.Mar 22 2023, 12:53 PM

use type aliases for tensor id and level

wrengr added inline comments.Mar 22 2023, 1:49 PM

What do you mean here? I'm guessing this should be "level", but I'm not sure what you mean by "favor(ing) constant levels"


This should remain "/// Exits"


Can you make this class "final" (ditto for the LoopInfo). You don't need/use subclassing here, and marking it final helps the compiler generate more efficient code


I'm guessing this should be LoopOrd. If not, then what is it?


This should be Level lvl (I'm pretty sure)


Should this be LoopOrd? If not, then what is it?


What is this supposed to be: LoopOrd, LoopId, other?

wrengr added inline comments.Mar 22 2023, 2:10 PM

I'd prefer these be defined as static inline functions rather than as macros, since that gives better type-safety and compiler error messages. If you need to use CMPI at several different types, then just use a template.


You should just use ValueRange::getTypes()


Please keep this as Level l


Please use TensorId here. I am just about to upload the CLs that make that into a newtype, so it's important to use the correct type instead of just using size_t/unsigned everywhere


Please define a dyn_cast variant of the getSparseTensorType function, and use that here and everywhere else. The SparseTensorType was created specifically to help avoid several code legibility and correctness concerns, so you should be using it everywhere possible.


You should be using SparseTensorType::getLevelRank here, since it is specifically the level-rank you want not the dim-rank


Please use Level for all levels. Even though it's just a typedef for now, I will be converting it to a proper type in the near future, so you should use the correct type rather than just using unsigned everywhere


I think it'd be clearer to just use const auto & here


Use "l" or "lvl" here. The name "d" is reserved for things of Dimension type, whereas this has Level type.


Why remove the const? It's clearer to know when local variables will never change


Please don't undo my factoring this out into a local variable. The condition is much easier to read when (1) it's all on one line, and (2) avoids repeating common expressions which forces the reader to double check if they are indeed the same or not.


It would be clearer to use break in the then-branch. That keeps you from needing to indent the else-branch (which is very long), and helps the reader avoid needing to check to see if there's something else after the else-branch.

Peiming marked an inline comment as done.Mar 22 2023, 2:20 PM
Peiming added inline comments.

I will stick with this. The reason I use macro is that I want to avoid typing builder and loc.


Okay, it is a mistake I made during rebasing.


This is unsigned index to another array (not the loop sequence). I will stick with this.


This is an unsigned counter.

Peiming updated this revision to Diff 507516.Mar 22 2023, 2:43 PM
Peiming marked 19 inline comments as done.

address comments + fix rebase mistakes.


I mean to pick the a static known dimension size instead of dynamic ones if there are multiple candidate. (i.e., DimOp folds to constant value). It might lead to a slightly better code.


rebase mistake.


I found else is easier to follow, because the control flow is more straight forward.

Peiming updated this revision to Diff 507519.Mar 22 2023, 2:52 PM

remove useless comments.

wrengr added inline comments.Mar 22 2023, 3:35 PM

Please don't undo this variable naming. The "mem" matches other places, and avoids confusion about whether "ptr" means MemRef vs the old "pointer" (now "position") vs llvm-pointers vs...


You should use the numTensors variable instead of calling tensors.size() repeatedly. (This is for code clarity rather than performance reasons)


It would be clearer to combine these together. Also, it would be clearer to use continue rather than extra indentation for the conditional. Putting those together, maybe use something like if (depends == 0) continue; assert(!reassoc); sliceSizes[...


You want to keep the triple-slash "///", since that's what the tooling uses for generating the API documentation

Peiming updated this revision to Diff 507769.Mar 23 2023, 9:39 AM

fix rebase mistakes.

Peiming updated this revision to Diff 507799.Mar 23 2023, 10:30 AM

fix some TODOs.

wrengr added inline comments.Mar 24 2023, 6:15 PM

That's the wrong bound to assert; you should instead compare against levelToDependentIdx[t].size() or maxLvlRank.

For correctness you should also assert(i < numLoops && t < numTensors).

If you rebase over D146684 to get the assertion helpers, then the full assertion would be assert(isValidLevel(t, lvl) && isValidLoopId(i) && !loopToDependencies[i][t].has_value()). The nice thing about those assertion helpers is that it gives the correct bound for the level (rather than falling back to maxLvlRank), and also guards against the case where i == kInvalidId.


Isn't it redundant to store the level-type in loopToDependencies, since it's already stored in lvlTypes? That is, afaict the following code snippet should always return successfully (assuming i and t are valid):

if (const auto dep = loopToDependencies[i][t]) {
  const Level depLvl = (*dep).first;
  const auto depOptLoop = lvlToLoop[t][depLvl];
  const LoopId depLoop = *depOptLoop;
  assert(lvlTypes[t][depLoop] == (*dep).second);

Assuming that's correct, then you shouldn't store the level-type in loopToDependencies because it's redundant information. Or if you absolutely must store the redundant copy for some reason, then you need to verify that the level-type agrees with the one in lvlTypes (or if the one in lvlTypes is undefined, then you need to store the dlt parameter there too; and conversely you need to adjust the setLevelAndType method to also verify consistency with loopToDependencies).

I agree that it's rather convoluted to need to say lvlTypes[t][*(lvlToLoop[t][*(loopToDependencies[i][t])])], but the only solution to that is to reconsider the design of all the fields of the Merger class. For example, if we had lvlTypes : (TensorId, Level) -> LevelType instead of the current lvlTypes : (TensorId, LoopId) -> LevelType, then you wouldn't need to use lvlToLoop there. Of course, you'd have to redo the rest of the Merger code to make that work (which may end up inserting more loopToLevel uses than however many levelToLoop uses it removes). Or a different design would be to combine lvlToLoop and lvlTypes into a single lvlInfo : (TensorId, Level) -> (LevelType, optional<LoopId>); of course that would require making different changes to the rest of the Merger code. In any case, I think it would be wise to wait for D146693 to land before trying to make any of these changes, since the newtypes of that CL will greatly simplify the process of rearranging all these vectors.


This should have been named levelToDependentLoop, since we don't use "idx" in this file anymore because it causes too much confusion. I mentioned this in the previous CL that introduced this field, but you landed the CL without fixing it


This comment should explain how exactly the slicedTids differs from the other tids. Also, is the same tensor allowed to occur in both fields? If so, then what does that mean? and, can the same level of the same tensor occur on both of the corresponding fields?


I think it'd be better to combine these all into a single const SmallVector<LoopLevelInfo> where struct LoopLevelInfo final { TensorId tid; Level lvl; bool isSlice; bool isReduced; }; —assuming it's okay to combine the original (tids,lvls) with the new (slicedTids,slicedLvls,sliceReduced) into a single vector/set. If that's not okay for some reason, then I still think it'd be good to use a single const SmallVector<LoopSlicedLevelInfo> field for the new stuff.

Using AOS will ensure that there's the right number of all the things that should correspond, as well as keeping the corresponding things close together. Plus it'll make it easier to add additional fields in the future as needed.




This comment is wrong for this field


" not need to actually create a sparse..."


"...only need to maintain the..."


Is this actually the full MemRef of coordinates for all levels? If so then it should be "minCoords" (with an "s"). Whereas if it's just a single coordinate for the given level, then it should be "minCrd".


Given the comment, I think this would be better named something like "isFirstSlice" or "isInitialSlice".


If the value is just a single coordinate, then this should also be singular.

wrengr added inline comments.Mar 24 2023, 7:02 PM

Given our discussion of boolean blindness, this assertion suggests that the two parameters should be combined into a single std::optional<std::pair<Level, Value>> parameter. (Albeit you'll still want to assert you didn't get a null Value.)

If that combined parameter doesn't work, why not? The only other thing that would be consistent with the assertion is union{ non-null-Value; struct{non-null-Value; Level}} but that's equivalent to struct{non-null-Value; optional<Level>}, so if that's what you want then you should assert(minCoord) instead.


This description doesn't make sense to me. Do you have a design doc that explains what exactly you mean by "slice" in this context, and explains how/why "reducing d0+d1+d2" translates into needing the two slices you mention?


Either "number of constraints needed to..." or "number of constraints that are needed to..."




That should be "A[i+j] => A[i+2]" to make it clear that it dereives from "j => 2".

Peiming marked an inline comment as done.Mar 27 2023, 9:43 AM
Peiming added inline comments.

Yeah, I haven't rebase against change that introduced maxLvlRank yet.


No, it is not redundant. The lvlTypes current is a mapping for (tid, loopid) => dlt, not (tid, level) => dlt. I think storing a pair makes more sense than introducing a complete (tid, level) => dlt map here because the dlt is only required here when there are non-trivial index expressions on the level.


Sry, probably it get overlooked during rebasing


I agree with you on this, in fact, see my comment at L222, I will do it in a separate patch though.


I will change the comment, it means whether it is the initial tensor that has not yet been sliced.


Yeah, I am writing a paper on this (but still at very early stage), I will share it with you later when it is more or less complete.

Peiming updated this revision to Diff 508772.Mar 27 2023, 1:11 PM
Peiming marked 21 inline comments as done.

rebase + address comments.


It should work, I will add a TODO here and submit the change in a separate patch.

Peiming updated this revision to Diff 508797.Mar 27 2023, 2:26 PM

fix windows building errors.

Peiming updated this revision to Diff 508801.Mar 27 2023, 2:32 PM

fix variable's name.

Peiming updated this revision to Diff 509708.Mar 30 2023, 9:52 AM

split up complicated functions

aartbik added inline comments.Apr 5 2023, 9:25 PM

As a TODO?


Please mark such comments with a TODO, since you clearly have an idea on how to to it better. That way we can periodically grep for TODO's and fix them. Unless you think we will never do that, and then this is a note to self that should not be here.


since you added a parameter, you also need to update this comment


Top level comments usually apply to the block of code, I find e.g.

assert(!loopToDependencies[i][t].has_value()); // must be first definition

a lot more readable, since you follow it direclty with the make pair/pushback


The non-trivial concept is used more widely now, but it would still be nice to define this per file, or at least per first occurrence what non-trivial really means. Or perhaps we should start using more standard terminology on affine expressions?


tensor level with index expression on it, reads awkward

how about

must be a tensor level that contains a non-trivial index expression


If the slice


I find this block of code extremely hard to read.
Any way to factor this into slightly smaller methods and combine these?


A note "make sure" is very ambiguous. Is that a note to self, or something that the code actively does.
Much better is to use an affirmative statement


appears first than normal tensors

appears before normal tensors?


I agree the else is very long and deep

Why not

if (!resolved) {




Ok, this block is where all the magic happens ;-)
I need to do one more careful pass over this...


The next


Sets (or Increment), but use one style


comments with should or must are a bit dangling unless you say what happens when this assumption is not true


Why did you change Exits -> exit? Original seems okay


since we have several nested structs, can you give each a short comment (as documentation, and to improve readability(


Here and below (and above), period at end


const ref?


I would start with the same comment as in the else, and state it in the affirmative rather than the speculative

// End either a for-loop or a while-loop that iterates over a slice.

Peiming updated this revision to Diff 511524.Apr 6 2023, 2:15 PM
Peiming marked 18 inline comments as done.

address some comments.

Peiming updated this revision to Diff 511542.Apr 6 2023, 3:06 PM

simplify code.


better now?

Peiming updated this revision to Diff 511543.Apr 6 2023, 3:11 PM

fix rebase issues.

Peiming updated this revision to Diff 512264.Apr 10 2023, 2:29 PM

fix typos.

aartbik added inline comments.Apr 12 2023, 2:23 PM

can we make this lvl < .... part also a helper method
(that way, all our asserts read almost like english ;-)


This computation does not match my mental interpretation of the text above (L79)


// offset

adds very little, Either use a sentence or remove


why the empty lines here?


We need depends - 1 slices

to make sure you don't read depends as part of the sentence


The comment applies to the assert, but the declaration is in between


Please elaborate. "Pop out" is not at all representative for what follows


Yes, although it could still use a bit more doc on what each block does (on entry of each block).
Also, I would not overuse the "NOTE" part, in principle, all comments are NOTEs and we should only
use them when something really should jump out


isn't that always the case here? Should that not be part of the method description then?


I think this still needs some work to make reading the block easier. The problem is that you have very concise comments in the header
(Generates .....), which is okay, since i don't want to see more there, but very few comments here, where it matters.

So I would still give every implementation function here an entry comment, but one that shows what is generated, using some pseudo-code of the output
That way, on entry of each method, I know what to expect, and dive into the various blocks with more pre-knowledge on what they do



this one seems out of place (all others generate stuff)
perhaps move it up or down in the method order (also in header)


here and a few other place, no period at end, please make once last pass over all new comments here


I think what is still missing is whether it is enforced (viz. asserts fail when trying to set it)
or whether clients are responsible.

So, something like

Clients are responsible for ensuring that the order of the returned

(I think my original comment was really on when I see "should" or "must", who is to blame in the end ;-)


Wren can correct me if I am wrong, but I think this needs to be minimum, right (as in smallest value, and not the lowest according to some other measure)?


is exceeds -> exceeds

but more importantly, I would state this,

We break out of the loop when the coordindate exceeds the slideSize.


the most recent slice (singular)


perhaps we should discuss somewhere else, but we use "unsigned" at most places, and size_t only for local operations, or inside casts and asserts
Since this is part of the API, I would prefer keeping it to unsigned, unless you have very strong reasons for this


as follows?


period at end.

"to allocate top level local"

makes very little sense when read in isolation. Just say what code fragment this points to


I know this was already there, but can we use override here to make it more clear that we implementing the base visitor class?
Or at the very least group all overrides into a

// Visitor method overrides.



has the locate property as well

wrengr added inline comments.Apr 12 2023, 2:53 PM

Why do you need this extra assertion? The isValidLevel assertion already ensures that the lvl is valid for the tensor t. Therefore, rather than checking the lvl twice, the rest of the code should instead maintain the invariant that levelToDependentLoop[t].size() == lvlTypes[t].size().

wrengr added inline comments.Apr 12 2023, 3:56 PM

There is a lot of unnecessary complexity and redundancy in storing all of:

  • lvlToLoop: (tid, lvl) -> optional<loopid>
  • loopToLvl: (tid, loopid) -> optional<lvl>
  • lvlTypes : (tid, loopid) -> dlt
  • loopToDependencies : (loopid, tid) -> (lvl, dlt)
  • levelToDependentLoop : (tid, lvl) -> set<loopid>

As I mentioned in my earlier comment, we can easily reconstruct the desired (tid, lvl) -> dlt map via [](t, l) { auto i = lvlToLoop[t][l]; return i ? lvlTypes[t][*i] : Undef; }. Therefore, we always have that loopToDependencies[i][t] == make_pair(l, reconstructedLvlTypes[t][l]). Consequently: since the first part of loopToDependencies has that every (t,i) pair determines l, it is trivial to construct the required (t,l) pair for passing to reconstructedLvlTypes; and since it's trivial to define reconstructedLvlTypes, therefore there is no benefit to storing this redundant information. And as I said before, whenever we store redundant information that means we must also therefore take pains to ensure that all the copies of that information remain consistent.

I agree that it would be nice to store the (tid, lvl) -> dlt map directly, and to use that in lieu of the current (tid, loopid) -> dlt map. Especially since the former can be quickly constructed from the types of the tensors, and doesn't require knowing anything about lvlToLoop/loopToLvl. However, regardless of which one we store, the point remains the same: there's no benefit to loopToDependencies storing this information redundantly, and if it stores redundant information anyways then it needs to ensure that it remains consistent with the (tid, {lvl,loopid}) -> dlt map.

Peiming updated this revision to Diff 513023.Apr 12 2023, 5:35 PM
Peiming marked 22 inline comments as done.

address comments.


I found this is actually a redundant check, if the lvl is valid then it is definitely inbound.


I added a comment, this is non-vritual function, so I did not use override here.

Peiming marked 2 inline comments as done.Apr 12 2023, 5:38 PM
Peiming added inline comments.

I agree that we can probably clean it up, but I will address this in separate patches through ;-)

aartbik accepted this revision.Apr 12 2023, 5:55 PM
aartbik added inline comments.

I first though this was commented out code ;-)

So make it


This revision is now accepted and ready to land.Apr 12 2023, 5:55 PM
Peiming updated this revision to Diff 513049.Apr 12 2023, 8:09 PM
Peiming marked an inline comment as done.

address comment.

Peiming updated this revision to Diff 513052.Apr 12 2023, 8:29 PM

fix test case memory leakage.

This revision was landed with ongoing or failed builds.Apr 12 2023, 8:29 PM
This revision was automatically updated to reflect the committed changes.

Hi @Peiming, the buildbots are failing (e.g. - could you please fix it?

Hi @Peiming, the buildbots are failing (e.g. - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Hi @Peiming, the buildbots are failing (e.g. - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Yes, sorry, I posted it in the wrong diff. The failures started with D148565.

@vzakhari this is not the patch that triggers the complaining. If you are going to revert, please make sure you revert the right one, which is

Hi @Peiming, the buildbots are failing (e.g. - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Yes, sorry, I posted it in the wrong diff. The failures started with D148565.

I can not see any related file in D148565 either....

@vzakhari I do not think my patch caused the error, see, there was already the same warning (but I do not know why it was not treated as errors).

Hi @Peiming, the buildbots are failing (e.g. - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Yes, sorry, I posted it in the wrong diff. The failures started with D148565.

I can not see any related file in D148565 either....

let me explain what I see: I looked at and the first failing build was 19162:; it points to D148565. The build issue is shown in stdio section:

FAILED: tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o 
/usr/local/bin/c++ -DGTEST_HAS_RTTI=0 -DMLIR_CUDA_CONVERSIONS_ENABLED=0 -DMLIR_ROCM_CONVERSIONS_ENABLED=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/tools/mlir/lib/Dialect/SparseTensor/Transforms -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/mlir/lib/Dialect/SparseTensor/Transforms -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/include -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/llvm/include -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/tools/mlir/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o -MF tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o.d -o tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
In file included from ../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp:17:
../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h: In member function ‘constexpr mlir::sparse_tensor::TensorLevel mlir::sparse_tensor::LoopEmitter::makeTensorLevel(mlir::sparse_tensor::TensorId, mlir::sparse_tensor::Level) const’:
../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h:199:29: error: call to non-‘constexpr’ function ‘unsigned int mlir::sparse_tensor::LoopEmitter::getNumTensors() const’
  199 |     return l * getNumTensors() + t;
      |                ~~~~~~~~~~~~~^~
../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h:195:12: note: ‘unsigned int mlir::sparse_tensor::LoopEmitter::getNumTensors() const’ declared here
  195 |   unsigned getNumTensors() const { return tensors.size(); }
      |            ^~~~~~~~~~~~~
90.309 [1754/1/4349] Linking CXX shared library lib/
ninja: build stopped: subcommand failed.

So it does point to the change from D148565.

Peiming added a comment.EditedMay 4 2023, 10:21 AM

Thanks! Now I see it! should fix it.