This is an archive of the discontinued LLVM Phabricator instance.

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
444 ↗	(On Diff #507478)	What do you mean here? I'm guessing this should be "level", but I'm not sure what you mean by "favor(ing) constant levels"
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
113	This should remain "/// Exits"
208–228	Can you make this class "`final`" (ditto for the `LoopInfo`). You don't need/use subclassing here, and marking it final helps the compiler generate more efficient code
212	I'm guessing this should be `LoopOrd`. If not, then what is it?
257	This should be `Level lvl` (I'm pretty sure)
327	Should this be `LoopOrd`? If not, then what is it?
349	What is this supposed to be: `LoopOrd`, `LoopId`, other?

wrengr added inline comments.Mar 22 2023, 2:10 PM

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
35–39	I'd prefer these be defined as `static inline` functions rather than as macros, since that gives better type-safety and compiler error messages. If you need to use CMPI at several different types, then just use a template.
43–45	You should just use `ValueRange::getTypes()`
255–256	Please keep this as `Level l`
364	Please use `TensorId` here. I am just about to upload the CLs that make that into a newtype, so it's important to use the correct type instead of just using `size_t`/`unsigned` everywhere
365	Please define a `dyn_cast` variant of the `getSparseTensorType` function, and use that here and everywhere else. The `SparseTensorType` was created specifically to help avoid several code legibility and correctness concerns, so you should be using it everywhere possible.
369	You should be using `SparseTensorType::getLevelRank` here, since it is specifically the level-rank you want not the dim-rank
370	Please use `Level` for all levels. Even though it's just a typedef for now, I will be converting it to a proper type in the near future, so you should use the correct type rather than just using `unsigned` everywhere
450	I think it'd be clearer to just use `const auto &` here
604	Use "l" or "lvl" here. The name "d" is reserved for things of `Dimension` type, whereas this has `Level` type.
780–781	Why remove the const? It's clearer to know when local variables will never change
782	Please don't undo my factoring this out into a local variable. The condition is much easier to read when (1) it's all on one line, and (2) avoids repeating common expressions which forces the reader to double check if they are indeed the same or not.
1115	It would be clearer to use `break` in the then-branch. That keeps you from needing to indent the else-branch (which is very long), and helps the reader avoid needing to check to see if there's something else after the else-branch.

Peiming marked an inline comment as done.Mar 22 2023, 2:20 PM

Peiming added inline comments.

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
35–39	I will stick with this. The reason I use macro is that I want to avoid typing `builder` and `loc`.
782	Okay, it is a mistake I made during rebasing.
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
327	This is `unsigned index` to another array (not the loop sequence). I will stick with this.
349	This is an `unsigned` counter.

address comments + fix rebase mistakes.

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
444 ↗	(On Diff #507478)	I mean to pick the a static known dimension size instead of dynamic ones if there are multiple candidate. (i.e., `DimOp` folds to constant value). It might lead to a slightly better code.
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
780–781	rebase mistake.
1115	I found `else` is easier to follow, because the control flow is more straight forward.

remove useless comments.

wrengr added inline comments.Mar 22 2023, 3:35 PM

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
44	Please don't undo this variable naming. The "mem" matches other places, and avoids confusion about whether "ptr" means `MemRef` vs the old "pointer" (now "position") vs llvm-pointers vs...
237–242	You should use the `numTensors` variable instead of calling `tensors.size()` repeatedly. (This is for code clarity rather than performance reasons)
255–256	It would be clearer to combine these together. Also, it would be clearer to use `continue` rather than extra indentation for the conditional. Putting those together, maybe use something like `if (depends == 0) continue; assert(!reassoc); sliceSizes[...`
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
113	You want to keep the triple-slash "///", since that's what the tooling uses for generating the API documentation

Harbormaster completed remote builds in B221147: Diff 507519.Mar 22 2023, 3:53 PM

fix rebase mistakes.

Harbormaster completed remote builds in B221334: Diff 507769.Mar 23 2023, 10:02 AM

fix some TODOs.

Harbormaster completed remote builds in B221355: Diff 507799.Mar 23 2023, 11:26 AM

wrengr added inline comments.Mar 24 2023, 6:15 PM

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
477 ↗	(On Diff #507799)	That's the wrong bound to assert; you should instead compare against `levelToDependentIdx[t].size()` or `maxLvlRank`. For correctness you should also `assert(i < numLoops && t < numTensors)`. If you rebase over D146684 to get the assertion helpers, then the full assertion would be `assert(isValidLevel(t, lvl) && isValidLoopId(i) && !loopToDependencies[i][t].has_value())`. The nice thing about those assertion helpers is that it gives the correct bound for the level (rather than falling back to `maxLvlRank`), and also guards against the case where `i == kInvalidId`.
478 ↗	(On Diff #507799)	Isn't it redundant to store the level-type in `loopToDependencies`, since it's already stored in `lvlTypes`? That is, afaict the following code snippet should always return successfully (assuming `i` and `t` are valid): if (const auto dep = loopToDependencies[i][t]) { const Level depLvl = (dep).first; const auto depOptLoop = lvlToLoop[t][depLvl]; assert(depOptLoop); const LoopId depLoop = depOptLoop; assert(lvlTypes[t][depLoop] == (dep).second); } Assuming that's correct, then you shouldn't store the level-type in `loopToDependencies` because it's redundant information. Or if you absolutely must store the redundant copy for some reason, then you need to verify that the level-type agrees with the one in `lvlTypes` (or if the one in `lvlTypes` is undefined, then you need to store the `dlt` parameter there too; and conversely you need to adjust the `setLevelAndType` method to also verify consistency with `loopToDependencies`). I agree that it's rather convoluted to need to say `lvlTypes[t][(lvlToLoop[t][*(loopToDependencies[i][t])])]`, but the only solution to that is to reconsider the design of all the fields of the `Merger` class. For example, if we had `lvlTypes : (TensorId, Level) -> LevelType` instead of the current `lvlTypes : (TensorId, LoopId) -> LevelType`, then you wouldn't need to use `lvlToLoop` there. Of course, you'd have to redo the rest of the `Merger` code to make that work (which may end up inserting more `loopToLevel` uses than however many `levelToLoop` uses it removes). Or a different design would be to combine `lvlToLoop` and `lvlTypes` into a single `lvlInfo : (TensorId, Level) -> (LevelType, optional<LoopId>)`; of course that would require making different changes to the rest of the `Merger` code. In any case, I think it would be wise to wait for D146693 to land before trying to make any of these changes, since the newtypes of that CL will greatly simplify the process of rearranging all these vectors.
598 ↗	(On Diff #507799)	This should have been named `levelToDependentLoop`, since we don't use "idx" in this file anymore because it causes too much confusion. I mentioned this in the previous CL that introduced this field, but you landed the CL without fixing it
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
196	This comment should explain how exactly the `slicedTids` differs from the other `tids`. Also, is the same tensor allowed to occur in both fields? If so, then what does that mean? and, can the same level of the same tensor occur on both of the corresponding fields?
197–201	I think it'd be better to combine these all into a single `const SmallVector<LoopLevelInfo>` where `struct LoopLevelInfo final { TensorId tid; Level lvl; bool isSlice; bool isReduced; };` —assuming it's okay to combine the original `(tids,lvls)` with the new `(slicedTids,slicedLvls,sliceReduced)` into a single vector/set. If that's not okay for some reason, then I still think it'd be good to use a single `const SmallVector<LoopSlicedLevelInfo>` field for the new stuff. Using AOS will ensure that there's the right number of all the things that should correspond, as well as keeping the corresponding things close together. Plus it'll make it easier to add additional fields in the future as needed.
198	"levels"
200	This comment is wrong for this field
209	"...do not need to actually create a sparse..."
210	"...only need to maintain the..."
211	Is this actually the full `MemRef` of coordinates for all levels? If so then it should be "minCoords" (with an "s"). Whereas if it's just a single coordinate for the given level, then it should be "minCrd".
219	Given the comment, I think this would be better named something like "isFirstSlice" or "isInitialSlice".
221	If the value is just a single coordinate, then this should also be singular.

wrengr added inline comments.Mar 24 2023, 7:02 PM

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
215	Given our discussion of boolean blindness, this assertion suggests that the two parameters should be combined into a single `std::optional<std::pair<Level, Value>>` parameter. (Albeit you'll still want to assert you didn't get a null `Value`.) If that combined parameter doesn't work, why not? The only other thing that would be consistent with the assertion is `union{ non-null-Value; struct{non-null-Value; Level}}` but that's equivalent to `struct{non-null-Value; optional<Level>}`, so if that's what you want then you should `assert(minCoord)` instead.
296–298	This description doesn't make sense to me. Do you have a design doc that explains what exactly you mean by "slice" in this context, and explains how/why "reducing `d0+d1+d2`" translates into needing the two slices you mention?
302	Either "number of constraints needed to..." or "number of constraints that are needed to..."
303	"level"
308	That should be "A[i+j] => A[i+2]" to make it clear that it dereives from "j => 2".

Peiming marked an inline comment as done.Mar 27 2023, 9:43 AM

Peiming added inline comments.

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
477 ↗	(On Diff #507799)	Yeah, I haven't rebase against change that introduced maxLvlRank yet.
478 ↗	(On Diff #507799)	No, it is not redundant. The lvlTypes current is a mapping for `(tid, loopid) => dlt`, not `(tid, level) => dlt`. I think storing a pair makes more sense than introducing a complete `(tid, level) => dlt` map here because the dlt is only required here when there are non-trivial index expressions on the level.
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
44	Sry, probably it get overlooked during rebasing
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
197–201	I agree with you on this, in fact, see my comment at L222, I will do it in a separate patch though.
219	I will change the comment, it means whether it is the initial tensor that has not yet been sliced.
296–298	Yeah, I am writing a paper on this (but still at very early stage), I will share it with you later when it is more or less complete.

rebase + address comments.

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
215	It should work, I will add a TODO here and submit the change in a separate patch.

Harbormaster completed remote builds in B222096: Diff 508772.Mar 27 2023, 1:36 PM

fix windows building errors.

fix variable's name.

Harbormaster completed remote builds in B222114: Diff 508801.Mar 27 2023, 3:05 PM

rebase.

split up complicated functions

Harbormaster completed remote builds in B222774: Diff 509708.Mar 30 2023, 10:46 AM

rebase.

Harbormaster completed remote builds in B223612: Diff 510849.Apr 4 2023, 10:50 AM

Peiming added a child revision: D147550: [mlir][sparse] implement index redution on dense level (for CSR).Apr 4 2023, 11:23 AM

aartbik added inline comments.Apr 5 2023, 9:25 PM

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
444 ↗	(On Diff #507478)	As a TODO?
444 ↗	(On Diff #507478)	Please mark such comments with a TODO, since you clearly have an idea on how to to it better. That way we can periodically grep for TODO's and fix them. Unless you think we will never do that, and then this is a note to self that should not be here.
453 ↗	(On Diff #510849)	since you added a parameter, you also need to update this comment
458 ↗	(On Diff #510849)	Top level comments usually apply to the block of code, I find e.g. assert(!loopToDependencies[i][t].has_value()); // must be first definition a lot more readable, since you follow it direclty with the make pair/pushback
470 ↗	(On Diff #510849)	The non-trivial concept is used more widely now, but it would still be nice to define this per file, or at least per first occurrence what non-trivial really means. Or perhaps we should start using more standard terminology on affine expressions?
503 ↗	(On Diff #510849)	tensor level with index expression on it, reads awkward how about must be a tensor level that contains a non-trivial index expression
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
633	If the slice
635	I find this block of code extremely hard to read. Any way to factor this into slightly smaller methods and combine these?
772	A note "make sure" is very ambiguous. Is that a note to self, or something that the code actively does. Much better is to use an affirmative statement
773	appears first than normal tensors appears before normal tensors?
1115	I agree the else is very long and deep Why not if (!resolved) { genSlice continue; } ....
1167	Ok, this block is where all the magic happens ;-) I need to do one more careful pass over this...
1638	The next
1716	Sets (or Increment), but use one style
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
60	comments with should or must are a bit dangling unless you say what happens when this assumption is not true
113	Why did you change Exits -> exit? Original seems okay
180–181	since we have several nested structs, can you give each a short comment (as documentation, and to improve readability(
196	Here and below (and above), period at end
300	const ref?
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
1536	I would start with the same comment as in the else, and state it in the affirmative rather than the speculative // End either a for-loop or a while-loop that iterates over a slice.

address some comments.

Herald added a subscriber: bviyer. · View Herald TranscriptApr 6 2023, 2:15 PM

Harbormaster completed remote builds in B224103: Diff 511524.Apr 6 2023, 2:32 PM

simplify code.

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
635	better now?

fix rebase issues.

Harbormaster completed remote builds in B224118: Diff 511543.Apr 6 2023, 3:29 PM

fix typos.

Harbormaster completed remote builds in B224648: Diff 512264.Apr 10 2023, 2:50 PM

aartbik added inline comments.Apr 12 2023, 2:23 PM

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
457 ↗	(On Diff #512264)	can we make this lvl < .... part also a helper method (that way, all our asserts read almost like english ;-)
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
71	This computation does not match my mental interpretation of the text above (L79)
75	// offset adds very little, Either use a sentence or remove
240	why the empty lines here?
256	We need `depends - 1` slices to make sure you don't read depends as part of the sentence
372	The comment applies to the assert, but the declaration is in between
452	Please elaborate. "Pop out" is not at all representative for what follows
635	Yes, although it could still use a bit more doc on what each block does (on entry of each block). Also, I would not overuse the "NOTE" part, in principle, all comments are NOTEs and we should only use them when something really should jump out
719	isn't that always the case here? Should that not be part of the method description then?
1170	I think this still needs some work to make reading the block easier. The problem is that you have very concise comments in the header (Generates .....), which is okay, since i don't want to see more there, but very few comments here, where it matters. So I would still give every implementation function here an entry comment, but one that shows what is generated, using some pseudo-code of the output That way, on entry of each method, I know what to expect, and dive into the various blocks with more pre-knowledge on what they do WDYT?
1183	this one seems out of place (all others generate stuff) perhaps move it up or down in the method order (also in header)
1240	here and a few other place, no period at end, please make once last pass over all new comments here
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
60	I think what is still missing is whether it is enforced (viz. asserts fail when trying to set it) or whether clients are responsible. So, something like Clients are responsible for ensuring that the order of the returned (I think my original comment was really on when I see "should" or "must", who is to blame in the end ;-)
225	Wren can correct me if I am wrong, but I think this needs to be minimum, right (as in smallest value, and not the lowest according to some other measure)?
264	is exceeds -> exceeds but more importantly, I would state this, We break out of the loop when the coordindate exceeds the slideSize.
296	the most recent slice (singular)
304	perhaps we should discuss somewhere else, but we use "unsigned" at most places, and size_t only for local operations, or inside casts and asserts Since this is part of the API, I would prefer keeping it to unsigned, unless you have very strong reasons for this
362	period at end. "to allocate top level local" makes very little sense when read in isolation. Just say what code fragment this points to
368	as follows?
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
62	I know this was already there, but can we use override here to make it more clear that we implementing the base visitor class? Or at the very least group all overrides into a // Visitor method overrides. ... section?
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp
418 ↗	(On Diff #512264)	has the `locate` property as well

wrengr added inline comments.Apr 12 2023, 2:53 PM

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
457 ↗	(On Diff #512264)	Why do you need this extra assertion? The `isValidLevel` assertion already ensures that the `lvl` is valid for the tensor `t`. Therefore, rather than checking the `lvl` twice, the rest of the code should instead maintain the invariant that `levelToDependentLoop[t].size() == lvlTypes[t].size()`.

wrengr added inline comments.Apr 12 2023, 3:56 PM

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
478 ↗	(On Diff #507799)	There is a lot of unnecessary complexity and redundancy in storing all of: `lvlToLoop: (tid, lvl) -> optional<loopid>` `loopToLvl: (tid, loopid) -> optional<lvl>` `lvlTypes : (tid, loopid) -> dlt` `loopToDependencies : (loopid, tid) -> (lvl, dlt)` `levelToDependentLoop : (tid, lvl) -> set<loopid>` As I mentioned in my earlier comment, we can easily reconstruct the desired `(tid, lvl) -> dlt` map via `[](t, l) { auto i = lvlToLoop[t][l]; return i ? lvlTypes[t][*i] : Undef; }`. Therefore, we always have that `loopToDependencies[i][t] == make_pair(l, reconstructedLvlTypes[t][l])`. Consequently: since the first part of `loopToDependencies` has that every `(t,i)` pair determines `l`, it is trivial to construct the required `(t,l)` pair for passing to `reconstructedLvlTypes`; and since it's trivial to define `reconstructedLvlTypes`, therefore there is no benefit to storing this redundant information. And as I said before, whenever we store redundant information that means we must also therefore take pains to ensure that all the copies of that information remain consistent. I agree that it would be nice to store the `(tid, lvl) -> dlt` map directly, and to use that in lieu of the current `(tid, loopid) -> dlt` map. Especially since the former can be quickly constructed from the types of the tensors, and doesn't require knowing anything about `lvlToLoop`/`loopToLvl`. However, regardless of which one we store, the point remains the same: there's no benefit to `loopToDependencies` storing this information redundantly, and if it stores redundant information anyways then it needs to ensure that it remains consistent with the `(tid, {lvl,loopid}) -> dlt` map.

address comments.

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
457 ↗	(On Diff #512264)	I found this is actually a redundant check, if the lvl is valid then it is definitely inbound.
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
62	I added a comment, this is non-vritual function, so I did not use override here.

Peiming marked 2 inline comments as done.Apr 12 2023, 5:38 PM

Peiming added inline comments.

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
478 ↗	(On Diff #507799)	I agree that we can probably clean it up, but I will address this in separate patches through ;-)

aartbik accepted this revision.Apr 12 2023, 5:55 PM

aartbik added inline comments.

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
1423	I first though this was commented out code ;-) So make it Generate: code

This revision is now accepted and ready to land.Apr 12 2023, 5:55 PM

Harbormaster completed remote builds in B225219: Diff 513023.Apr 12 2023, 6:00 PM

address comment.

Harbormaster completed remote builds in B225240: Diff 513049.Apr 12 2023, 8:23 PM

fix test case memory leakage.

This revision was landed with ongoing or failed builds.Apr 12 2023, 8:29 PM

Closed by commit rG5fd9d801350d: [mlir][sparse] extend loop emitter to emit slice driven loops (authored by Peiming). · Explain Why

This revision was automatically updated to reflect the committed changes.

Peiming added a commit: rG5fd9d801350d: [mlir][sparse] extend loop emitter to emit slice driven loops.

Harbormaster completed remote builds in B225243: Diff 513052.Apr 12 2023, 8:45 PM

Peiming mentioned this in D148565: [mlir][sparse] group tensor id and levels into pairs in loop emitter.Apr 17 2023, 1:10 PM

Peiming mentioned this in rG36c95ee739c0: [mlir][sparse] group tensor id and levels into pairs in loop emitter.May 4 2023, 9:15 AM

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

In D142930#4319290, @vzakhari wrote:

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

In D142930#4319294, @Peiming wrote:

In D142930#4319290, @vzakhari wrote:

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Yes, sorry, I posted it in the wrong diff. The failures started with D148565.

@vzakhari this is not the patch that triggers the complaining. If you are going to revert, please make sure you revert the right one, which is https://reviews.llvm.org/D148565

In D142930#4319326, @vzakhari wrote:

In D142930#4319294, @Peiming wrote:

In D142930#4319290, @vzakhari wrote:

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Yes, sorry, I posted it in the wrong diff. The failures started with D148565.

I can not see any related file in D148565 either....

@vzakhari I do not think my patch caused the error, see https://lab.llvm.org/buildbot/#/builders/160/builds/19161, there was already the same warning (but I do not know why it was not treated as errors).

In D142930#4319341, @Peiming wrote:

In D142930#4319326, @vzakhari wrote:

In D142930#4319294, @Peiming wrote:

In D142930#4319290, @vzakhari wrote:

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Yes, sorry, I posted it in the wrong diff. The failures started with D148565.

I can not see any related file in D148565 either....

let me explain what I see: I looked at https://lab.llvm.org/buildbot/#/builders/160 and the first failing build was 19162: https://lab.llvm.org/buildbot/#/builders/160/builds/19162; it points to D148565. The build issue is shown in stdio section:

FAILED: tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o 
/usr/local/bin/c++ -DGTEST_HAS_RTTI=0 -DMLIR_CUDA_CONVERSIONS_ENABLED=0 -DMLIR_ROCM_CONVERSIONS_ENABLED=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/tools/mlir/lib/Dialect/SparseTensor/Transforms -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/mlir/lib/Dialect/SparseTensor/Transforms -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/include -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/llvm/include -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/tools/mlir/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o -MF tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o.d -o tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
In file included from ../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp:17:
../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h: In member function ‘constexpr mlir::sparse_tensor::TensorLevel mlir::sparse_tensor::LoopEmitter::makeTensorLevel(mlir::sparse_tensor::TensorId, mlir::sparse_tensor::Level) const’:
../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h:199:29: error: call to non-‘constexpr’ function ‘unsigned int mlir::sparse_tensor::LoopEmitter::getNumTensors() const’
  199 |     return l * getNumTensors() + t;
      |                ~~~~~~~~~~~~~^~
../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h:195:12: note: ‘unsigned int mlir::sparse_tensor::LoopEmitter::getNumTensors() const’ declared here
  195 |   unsigned getNumTensors() const { return tensors.size(); }
      |            ^~~~~~~~~~~~~
90.309 [1754/1/4349] Linking CXX shared library lib/libclang-cpp.so.17git
ninja: build stopped: subcommand failed.

So it does point to the change from D148565.

Thanks! Now I see it! https://reviews.llvm.org/D149874 should fix it.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

SparseTensor/

Transforms/

LoopEmitter.h

144 lines

LoopEmitter.cpp

964 lines

SparseTensorRewriting.cpp

2 lines

Sparsification.cpp

13 lines

test/

Dialect/

SparseTensor/

sparse_conv_2d_slice_based.mlir

300 lines

Integration/

Dialect/

SparseTensor/

CPU/

sparse_conv_2d_slice_based.mlir

81 lines

sparse_conv_3d_slice_based.mlir

97 lines

Diff 493420

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	public:
using OutputUpdater = function_ref<Value(OpBuilder &builder, Location loc,		using OutputUpdater = function_ref<Value(OpBuilder &builder, Location loc,
Value memref, Value tensor)>;		Value memref, Value tensor)>;
// Map from [tid, dim] to a list of dependent [tid, dim] for affine expression		// Map from [tid, dim] to a list of dependent [tid, dim] for affine expression
// index on sparse tensors.		// index on sparse tensors.
// E.g., for affine index (d0 + d1), it depends on two [tid, dim] that defines		// E.g., for affine index (d0 + d1), it depends on two [tid, dim] that defines
// d0 and d1 (for affine expression reduction).		// d0 and d1 (for affine expression reduction).
// If the list is empty, it means that there is no affine expression on the		// If the list is empty, it means that there is no affine expression on the
// input [tid, dim].		// input [tid, dim].
		// NOTE: the order of the returned list should be consistent with the
		aartbikUnsubmitted Done Reply Inline Actions comments with should or must are a bit dangling unless you say what happens when this assumption is not true aartbik: comments with should or must are a bit dangling unless you say what happens when this…
		aartbikUnsubmitted Done Reply Inline Actions I think what is still missing is whether it is enforced (viz. asserts fail when trying to set it) or whether clients are responsible. So, something like Clients are responsible for ensuring that the order of the returned (I think my original comment was really on when I see "should" or "must", who is to blame in the end ;-) aartbik: I think what is still missing is whether it is enforced (viz. asserts fail when trying to set…
		// topological order of the iteration graph.
using DependentDimGetter =		using DependentDimGetter =
function_ref<std::vector<std::pair<unsigned, unsigned>>(unsigned,		function_ref<std::vector<std::pair<unsigned, unsigned>>(unsigned,
unsigned)>;		unsigned)>;

LoopEmitter() = default;		LoopEmitter() = default;

/// Takes an array of tensors inputs, on which the generated loops will		/// Takes an array of tensors inputs, on which the generated loops will
/// iterate on. The index of the tensor in the array is also the tensor id		/// iterate on. The index of the tensor in the array is also the tensor id
Show All 35 Lines	public:
/// for (i = p0; i < end; i++)		/// for (i = p0; i < end; i++)
/// ...		/// ...
/// // loop sequence end.		/// // loop sequence end.
/// }		/// }
void enterNewLoopSeq(OpBuilder &builder, Location loc, ArrayRef<size_t> tids,		void enterNewLoopSeq(OpBuilder &builder, Location loc, ArrayRef<size_t> tids,
ArrayRef<size_t> dims);		ArrayRef<size_t> dims);

// exit the current loop sequence, this will reset universal index to 0.		// exit the current loop sequence, this will reset universal index to 0.
void exitCurrentLoopSeq() {		void exitCurrentLoopSeq(OpBuilder &builder, Location loc);
		aartbikUnsubmitted Done Reply Inline Actions Why did you change Exits -> exit? Original seems okay aartbik: Why did you change Exits -> exit? Original seems okay
		wrengrUnsubmitted Done Reply Inline Actions This should remain "/// Exits" wrengr: This should remain "/// Exits"
		wrengrUnsubmitted Done Reply Inline Actions You want to keep the triple-slash "///", since that's what the tooling uses for generating the API documentation wrengr: You want to keep the triple-slash "///", since that's what the tooling uses for generating the…
assert(loopSeqStack.size() == loopStack.size() + 1);
loopSeqStack.pop_back();
}

// TODO: Gets rid of `dim` in the argument list? Track the dimension we		// TODO: Gets rid of `dim` in the argument list? Track the dimension we
// are currently at internally. Then it would be enterNextDimForTensor.		// are currently at internally. Then it would be enterNextDimForTensor.
// Still need a way to specify the dim for non annoated dense tensor though,		// Still need a way to specify the dim for non annoated dense tensor though,
// as it can be accessed out of order.		// as it can be accessed out of order.
/// Emits loop over tensor_tid_dim, it assumes that loops between		/// Emits loop over tensor_tid_dim, it assumes that loops between
/// tensor_tid_[0, dim - 1] have already been generated.		/// tensor_tid_[0, dim - 1] have already been generated.
/// The function will also perform in-place update on the `reduc` vector to		/// The function will also perform in-place update on the `reduc` vector to
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	const std::vector<std::vector<Value>> &getIdxBuffer() const {
return idxBuffer;		return idxBuffer;
};		};
const std::vector<Value> &getValBuffer() const { return valBuffer; };		const std::vector<Value> &getValBuffer() const { return valBuffer; };

constexpr static llvm::StringLiteral getLoopEmitterLoopAttrName() {		constexpr static llvm::StringLiteral getLoopEmitterLoopAttrName() {
return llvm::StringLiteral("Emitted from");		return llvm::StringLiteral("Emitted from");
}		}

private:		private:
struct LoopLevelInfo {		struct LoopLevelInfo {
		aartbikUnsubmitted Done Reply Inline Actions since we have several nested structs, can you give each a short comment (as documentation, and to improve readability( aartbik: since we have several nested structs, can you give each a short comment (as documentation, and…
LoopLevelInfo(ArrayRef<size_t> tids, ArrayRef<size_t> dims, Operation *loop,		LoopLevelInfo(ArrayRef<size_t> tids, ArrayRef<size_t> dims,
		ArrayRef<size_t> slicedTids, ArrayRef<size_t> slicedDims,
		ArrayRef<bool> sliceResolved, Operation *loop,
Block *userBlock, Value iv, StringAttr loopTag)		Block *userBlock, Value iv, StringAttr loopTag)
: tids(tids), dims(dims), loop(loop), userCodeBlock(userBlock), iv(iv) {		: tids(tids), dims(dims), slicedTids(slicedTids),
		slicedDims(slicedDims), sliceResolved(sliceResolved), loop(loop),
		userCodeBlock(userBlock), iv(iv) {
// Attached a special tag to loop emitter generated loop.		// Attached a special tag to loop emitter generated loop.
if (loopTag)		if (loopTag)
loop->setAttr(LoopEmitter::getLoopEmitterLoopAttrName(), loopTag);		loop->setAttr(LoopEmitter::getLoopEmitterLoopAttrName(), loopTag);
}		}
// TODO: maybe use a vector<pair> for tid and dim?		// TODO: maybe use a vector<pair> for tid and dim?
// The set of tensors that the loop is operating on		// The set of tensors that the loop is operating on
const llvm::SmallVector<size_t> tids;		const llvm::SmallVector<size_t> tids;
// The corresponding dims for the tensors		// The corresponding dims for the tensors
		aartbikUnsubmitted Done Reply Inline Actions Here and below (and above), period at end aartbik: Here and below (and above), period at end
		wrengrUnsubmitted Done Reply Inline Actions This comment should explain how exactly the `slicedTids` differs from the other `tids`. Also, is the same tensor allowed to occur in both fields? If so, then what does that mean? and, can the same level of the same tensor occur on both of the corresponding fields? wrengr: This comment should explain how exactly the `slicedTids` differs from the other `tids`. Also…
const llvm::SmallVector<size_t> dims;		const llvm::SmallVector<size_t> dims;
		// The set of tensors that the loop is operating on
		wrengrUnsubmitted Done Reply Inline Actions "levels" wrengr: "levels"
		const llvm::SmallVector<size_t> slicedTids;
		// The corresponding dims for the tensors
		wrengrUnsubmitted Done Reply Inline Actions This comment is wrong for this field wrengr: This comment is wrong for this field
		const llvm::SmallVector<size_t> slicedDims;
		wrengrUnsubmitted Done Reply Inline Actions I think it'd be better to combine these all into a single `const SmallVector<LoopLevelInfo>` where `struct LoopLevelInfo final { TensorId tid; Level lvl; bool isSlice; bool isReduced; };` —assuming it's okay to combine the original `(tids,lvls)` with the new `(slicedTids,slicedLvls,sliceReduced)` into a single vector/set. If that's not okay for some reason, then I still think it'd be good to use a single `const SmallVector<LoopSlicedLevelInfo>` field for the new stuff. Using AOS will ensure that there's the right number of all the things that should correspond, as well as keeping the corresponding things close together. Plus it'll make it easier to add additional fields in the future as needed. wrengr: I think it'd be better to combine these all into a single `const SmallVector<LoopLevelInfo>`…
		PeimingAuthorUnsubmitted Done Reply Inline Actions I agree with you on this, in fact, see my comment at L222, I will do it in a separate patch though. Peiming: I agree with you on this, in fact, see my comment at L222, I will do it in a separate patch…
		// The corresponding dims for the tensors
		const llvm::SmallVector<bool> sliceResolved;
const Operation *loop; // the loop operation		const Operation *loop; // the loop operation
Block *const userCodeBlock; // the block holding user' generated code.		Block *const userCodeBlock; // the block holding user' generated code.
const Value iv; // the induction variable for the loop		const Value iv; // the induction variable for the loop
};		};

		struct SliceInfo {
		wrengrUnsubmitted Done Reply Inline Actions "...do not need to actually create a sparse..." wrengr: "...do not need to actually create a sparse..."
		SliceInfo(Value baseSlice, Value minCoord, Value offset, Value isNonEmpty,
		wrengrUnsubmitted Done Reply Inline Actions "...only need to maintain the..." wrengr: "...only need to maintain the..."
		std::optional<unsigned> slicedOnLvl, unsigned depth)
		wrengrUnsubmitted Done Reply Inline Actions Is this actually the full `MemRef` of coordinates for all levels? If so then it should be "minCoords" (with an "s"). Whereas if it's just a single coordinate for the given level, then it should be "minCrd". wrengr: Is this actually the full `MemRef` of coordinates for all levels? If so then it should be…
		: baseSlice(baseSlice), minCoord(minCoord), offset(offset),
		wrengrUnsubmitted Done Reply Inline Actions I'm guessing this should be `LoopOrd`. If not, then what is it? wrengr: I'm guessing this should be `LoopOrd`. If not, then what is it?
		isNonEmpty(isNonEmpty), slicedOnLvl(slicedOnLvl), depth(depth) {
		assert(!slicedOnLvl \|\| minCoord);
		}
		wrengrUnsubmitted Done Reply Inline Actions Given our discussion of boolean blindness, this assertion suggests that the two parameters should be combined into a single `std::optional<std::pair<Level, Value>>` parameter. (Albeit you'll still want to assert you didn't get a null `Value`.) If that combined parameter doesn't work, why not? The only other thing that would be consistent with the assertion is `union{ non-null-Value; struct{non-null-Value; Level}}` but that's equivalent to `struct{non-null-Value; optional<Level>}`, so if that's what you want then you should `assert(minCoord)` instead. wrengr: Given our discussion of boolean blindness, this assertion suggests that the two parameters…
		PeimingAuthorUnsubmitted Done Reply Inline Actions It should work, I will add a TODO here and submit the change in a separate patch. Peiming: It should work, I will add a TODO here and submit the change in a separate patch.

		// Whether this is the first slice
		bool isInitialTensor() const { return !slicedOnLvl.has_value(); }

		wrengrUnsubmitted Done Reply Inline Actions Given the comment, I think this would be better named something like "isFirstSlice" or "isInitialSlice". wrengr: Given the comment, I think this would be better named something like "isFirstSlice" or…
		PeimingAuthorUnsubmitted Done Reply Inline Actions I will change the comment, it means whether it is the initial tensor that has not yet been sliced. Peiming: I will change the comment, it means whether it is the initial tensor that has not yet been…
		Value baseSlice; // the current slices being reduced
		Value minCoord; // the minimal coordinates of the slice on lvl.
		wrengrUnsubmitted Done Reply Inline Actions If the value is just a single coordinate, then this should also be singular. wrengr: If the value is just a single coordinate, then this should also be singular.
		Value offset; // the offset of the current slice.
		Value isNonEmpty; // whether the slice is empty.
		std::optional<unsigned> slicedOnLvl; // the level on which the slice is done
		unsigned depth; // the depth (relative to dependentDimMap[tid][lvl]).
		aartbikUnsubmitted Done Reply Inline Actions Wren can correct me if I am wrong, but I think this needs to be minimum, right (as in smallest value, and not the lowest according to some other measure)? aartbik: Wren can correct me if I am wrong, but I think this needs to be minimum, right (as in smallest…
		};

/// Linearizes address for dense dimension (i.e., p = (i * d0) + j).		/// Linearizes address for dense dimension (i.e., p = (i * d0) + j).
		wrengrUnsubmitted Done Reply Inline Actions Can you make this class "`final`" (ditto for the `LoopInfo`). You don't need/use subclassing here, and marking it final helps the compiler generate more efficient code wrengr: Can you make this class "`final`" (ditto for the `LoopInfo`). You don't need/use subclassing…
Value genAddress(OpBuilder &builder, Location loc, size_t tid, size_t dim,		Value genAddress(OpBuilder &builder, Location loc, size_t tid, size_t dim,
Value iv);		Value iv);

/// Generate a predicate to determine whether the tranformed coordinates is		/// Generate a predicate to determine whether the tranformed coordinates is
/// on the given slice.		/// on the given slice.
/// Returns std::pair<Transformed coordinates, Predicate>		/// Returns std::pair<Transformed coordinates, Predicate>
std::pair<Value, Value> genSliceLegitPredicate(OpBuilder &builder,		std::pair<Value, Value> genSliceLegitPredicate(OpBuilder &builder,
Location loc, Value coord,		Location loc, Value coord,
Show All 11 Lines	private:

/// Emits extra locals, since the locals might not be in simplified lattices		/// Emits extra locals, since the locals might not be in simplified lattices
/// point used to generate the loops, but are still required to generates		/// point used to generate the loops, but are still required to generates
/// expressions.		/// expressions.
void emitExtraLocalsForTensorsAtDenseDims(OpBuilder &builder, Location loc,		void emitExtraLocalsForTensorsAtDenseDims(OpBuilder &builder, Location loc,
ArrayRef<size_t> tids,		ArrayRef<size_t> tids,
ArrayRef<size_t> dims);		ArrayRef<size_t> dims);

		Operation *emitForLoopOverTensorAtDim(OpBuilder &builder, Location loc,
		size_t tid, size_t dim,
		wrengrUnsubmitted Done Reply Inline Actions This should be `Level lvl` (I'm pretty sure) wrengr: This should be `Level lvl` (I'm pretty sure)
		MutableArrayRef<Value> reduc,
		bool isParallel);

/// Exits a for loop, returns the reduction results, e.g.,		/// Exits a for loop, returns the reduction results, e.g.,
/// For sequential for loops:		/// For sequential for loops:
/// %ret = for () {		/// %ret = for () {
/// ...		/// ...
		aartbikUnsubmitted Done Reply Inline Actions is exceeds -> exceeds but more importantly, I would state this, We break out of the loop when the coordindate exceeds the slideSize. aartbik: is exceeds -> exceeds but more importantly, I would state this, We break out of the loop when…
/// %val = addi %args, %c		/// %val = addi %args, %c
/// yield %val		/// yield %val
/// }		/// }
/// For parallel loops, the following generated code by users:		/// For parallel loops, the following generated code by users:
/// %ret = parallel () init(%args) {		/// %ret = parallel () init(%args) {
/// ...		/// ...
/// %val = op %args, %c		/// %val = op %args, %c
/// }		/// }
Show All 11 Lines	private:
/// users (`reduc`).		/// users (`reduc`).
void exitForLoop(RewriterBase &rewriter, Location loc,		void exitForLoop(RewriterBase &rewriter, Location loc,
MutableArrayRef<Value> reduc);		MutableArrayRef<Value> reduc);

/// Exits a while loop, returns the reduction results.		/// Exits a while loop, returns the reduction results.
void exitWhileLoop(OpBuilder &builder, Location loc,		void exitWhileLoop(OpBuilder &builder, Location loc,
MutableArrayRef<Value> reduc);		MutableArrayRef<Value> reduc);

		//
		// Slice-driven loop related methods.
		//

		/// Retrieves the most recent slices on lvl. To reduce affine expression like
		aartbikUnsubmitted Done Reply Inline Actions the most recent slice (singular) aartbik: the most recent slice (singular)
		/// d0 + d1 + d2, we need two slices (one of size d1 + d2, and the other of
		/// size d2). This methods returns the latter slice (of size d2), which is
		wrengrUnsubmitted Done Reply Inline Actions This description doesn't make sense to me. Do you have a design doc that explains what exactly you mean by "slice" in this context, and explains how/why "reducing `d0+d1+d2`" translates into needing the two slices you mention? wrengr: This description doesn't make sense to me. Do you have a design doc that explains what exactly…
		PeimingAuthorUnsubmitted Done Reply Inline Actions Yeah, I am writing a paper on this (but still at very early stage), I will share it with you later when it is more or less complete. Peiming: Yeah, I am writing a paper on this (but still at very early stage), I will share it with you…
		/// also the final slice on the level.
		SliceInfo &getFinalSliceOnLvl(size_t tid, size_t lvl);
		aartbikUnsubmitted Done Reply Inline Actions const ref? aartbik: const ref?

		/// Get the total number of constraints that needed to fully resolve the
		wrengrUnsubmitted Done Reply Inline Actions Either "number of constraints needed to..." or "number of constraints that are needed to..." wrengr: Either "number of constraints needed to..." or "number of constraints that are needed to..."
		/// dependent dimension on tensor[tid].
		wrengrUnsubmitted Done Reply Inline Actions "level" wrengr: "level"
		size_t sliceTotalConstraints(size_t tid);
		aartbikUnsubmitted Done Reply Inline Actions perhaps we should discuss somewhere else, but we use "unsigned" at most places, and size_t only for local operations, or inside casts and asserts Since this is part of the API, I would prefer keeping it to unsigned, unless you have very strong reasons for this aartbik: perhaps we should discuss somewhere else, but we use "unsigned" at most places, and size_t only…

		/// Whether the tid is fully resolved, i.e., all the dependent dimension are
		/// reduced by slices offsets.
		bool sliceFullyResolved(size_t tid);
		wrengrUnsubmitted Done Reply Inline Actions That should be "A[i+j] => A[i+2]" to make it clear that it dereives from "j => 2". wrengr: That should be "A[i+j] => A[i+2]" to make it clear that it dereives from "j => 2".

		/// Generates a whileOp to iterate over a subset of coordinates on tid on lvl
		/// using the pHi and pLo provided, the loop break on the first coordinate
		/// that exceeds the slice boundary (i.e., coord >= slice.offset +
		/// slice.size).
		std::pair<Operation *, ValueRange>
		genSliceLvlTraverseLoop(OpBuilder &builder, Location loc, Value pLo,
		Value pHi, Value offset, size_t tid, size_t lvl,
		size_t depth, ValueRange userReduc, bool genYield,
		/bodyBody=/
		llvm::function_ref<void(OpBuilder &, Location, Value,
		MutableArrayRef<Value>)>);

		/// Generates a nested loop that iterates over tid on all the coordinates on
		/// lvl.
		ValueRange genSliceAllLvlTraverseLoop(
		OpBuilder &builder, Location loc, Value offset, size_t tid, size_t lvl,
		size_t depth, ValueRange userReduc,
		/bodyBody=/
		wrengrUnsubmitted Done Reply Inline Actions Should this be `LoopOrd`? If not, then what is it? wrengr: Should this be `LoopOrd`? If not, then what is it?
		PeimingAuthorUnsubmitted Done Reply Inline Actions This is `unsigned index` to another array (not the loop sequence). I will stick with this. Peiming: This is `unsigned index` to another array (not the loop sequence). I will stick with this.
		llvm::function_ref<void(OpBuilder &, Location, Value,
		MutableArrayRef<Value>)>);

		/// Generates code to get the first non-empty slice of tid on lvl.
		/// return true if has already been resolved.
		bool genSliceBegin(OpBuilder &builder, Location loc, size_t tid, size_t lvl);

		/// Generates code to get the next non-empty slices of tid on lvl.
		void genSliceNextInduction(OpBuilder &builder, Location loc,
		const Operation *whileOp, size_t tid, size_t lvl,
		SmallVectorImpl<Value> &operands,
		unsigned &retIdx);

		/// Generates a slice-driven while loop like follows.
		///
		/// curSlice = getFirstNonEmptySlice(tensor).
		///
		/// while(isNonEmpty) {
		/// ..user code..
		/// isNonEmpty, curSlice = getNextNonEmptySlice(curSlice)
		/// }
		Operation *emitSliceDrivenLoopOverTensorAtDim(OpBuilder &builder,
		wrengrUnsubmitted Done Reply Inline Actions What is this supposed to be: `LoopOrd`, `LoopId`, other? wrengr: What is this supposed to be: `LoopOrd`, `LoopId`, other?
		PeimingAuthorUnsubmitted Done Reply Inline Actions This is an `unsigned` counter. Peiming: This is an `unsigned` counter.
		Location loc, size_t tid,
		size_t lvl,
		MutableArrayRef<Value> reduc);

/// A optional string attribute that should be attached to the loop		/// A optional string attribute that should be attached to the loop
/// generated by loop emitter, it might help following passes to identify		/// generated by loop emitter, it might help following passes to identify
/// loops that operates on sparse tensors more easily.		/// loops that operates on sparse tensors more easily.
StringAttr loopTag;		StringAttr loopTag;
/// Whether the loop emitter needs to treat the last tensor as the output		/// Whether the loop emitter needs to treat the last tensor as the output
/// tensor.		/// tensor.
bool hasOutput;		bool hasOutput;
bool isSparseOut;		bool isSparseOut;
/// Input and (optional) output tensors.		/// Input and (optional) output tensors.
		aartbikUnsubmitted Done Reply Inline Actions period at end. "to allocate top level local" makes very little sense when read in isolation. Just say what code fragment this points to aartbik: period at end. "to allocate top level local" makes very little sense when read in isolation.
std::vector<Value> tensors;		std::vector<Value> tensors;
/// Values realted to slices.		/// Values realted to slices.
std::vector<bool> isSparseSlices;		std::vector<bool> isSparseSlices;
std::vector<std::vector<Value>> sliceOffsets;		std::vector<std::vector<Value>> sliceOffsets;
std::vector<std::vector<Value>> sliceStrides;		std::vector<std::vector<Value>> sliceStrides;
/// The dim and dim type array for each tensor.		/// The dim and dim type array for each tensor.
		aartbikUnsubmitted Done Reply Inline Actions as follows? aartbik: as follows?
std::vector<std::vector<Value>> dims;		std::vector<std::vector<Value>> dims;
std::vector<std::vector<DimLevelType>> dimTypes;		std::vector<std::vector<DimLevelType>> dimTypes;
/// Sparse iteration information (by tensor and dim). These arrays		/// Sparse iteration information (by tensor and dim). These arrays
/// are updated to remain current within the current loop.		/// are updated to remain current within the current loop.
std::vector<std::vector<Value>> pidxs;		std::vector<std::vector<Value>> pidxs;
std::vector<std::vector<Value>> coord;		std::vector<std::vector<Value>> coord;
std::vector<std::vector<Value>> highs;		std::vector<std::vector<Value>> highs;
std::vector<std::vector<Value>> ptrBuffer; // to_pointers		std::vector<std::vector<Value>> ptrBuffer; // to_pointers
std::vector<std::vector<Value>> idxBuffer; // to_indices		std::vector<std::vector<Value>> idxBuffer; // to_indices
std::vector<Value> valBuffer; // to_value		std::vector<Value> valBuffer; // to_value

// Map from [tid, dim] to a list of dependent [tid, dim].
// See comments for `DependentDimGetter`.
std::vector<std::vector<std::vector<std::pair<unsigned, unsigned>>>>
dependentDimMap;

// Loop Stack, stores the information of all the nested loops that are		// Loop Stack, stores the information of all the nested loops that are
// alive.		// alive.
std::vector<LoopLevelInfo> loopStack;		std::vector<LoopLevelInfo> loopStack;

// Loop Sequence Stack, stores the unversial index for the current loop		// Loop Sequence Stack, stores the unversial index for the current loop
// sequence.		// sequence. and a list of tids which was taken sliced.
std::vector<Value> loopSeqStack;		// TODO: maybe we should have a LoopSeqInfo
		std::vector<
		std::pair<Value, std::vector<std::tuple<unsigned, unsigned, bool>>>>
		loopSeqStack;

// Maps AffineDimExpr to the index of the loop in loopStack.		// Maps AffineDimExpr to the index of the loop in loopStack.
// TODO: We should probably use a callback function here to make it more		// TODO: We should probably use a callback function here to make it more
// general.		// general.
std::vector<unsigned> sparsiferLoopLvlMap;		std::vector<unsigned> sparsiferLoopLvlMap;

		//
		// Slice-driven loops related fields.
		//

		// Map from [tid, dim] to a list of dependent [tid, dim].
		// See comments for `DependentDimGetter`.
		std::vector<std::vector<std::vector<std::pair<unsigned, unsigned>>>>
		dependentDimMap;

		// The cached pointer buffer for the slices, they serve the same purpose as
		// ptrBuffer for compressed dimensions. But they always starts with the first
		// pidx pointing to coord > slice.offset to avoid iteration from the
		// beginning.
		std::vector<std::vector<std::vector<Value>>> slicePtrBuffer;

		// The cached size for each slices.
		std::vector<std::vector<std::vector<Value>>> sliceSizes;

		// The number of resolved constraints so far.
		std::vector<unsigned> sliceResolvedConstraints;

		// sliceStack[tid] holds the generated slice stack on tid.
		std::vector<std::vector<SliceInfo>> sliceStack;

// TODO: not yet used, it should track the current level for each tensor		// TODO: not yet used, it should track the current level for each tensor
// to help eliminate `dim` paramters from above APIs.		// to help eliminate `dim` paramters from above APIs.
// std::vector<size_t> curLv;		// std::vector<size_t> curLv;
};		};

} // namespace sparse_tensor		} // namespace sparse_tensor
} // namespace mlir		} // namespace mlir

#endif // MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_SPARSETENSORLOOPEMITTER_H_		#endif // MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_SPARSETENSORLOOPEMITTER_H_

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp

Show All 9 Lines
#include "CodegenUtils.h"		#include "CodegenUtils.h"

#include "mlir/Dialect/Arith/IR/Arith.h"		#include "mlir/Dialect/Arith/IR/Arith.h"
#include "mlir/Dialect/Bufferization/IR/Bufferization.h"		#include "mlir/Dialect/Bufferization/IR/Bufferization.h"
#include "mlir/Dialect/Linalg/IR/Linalg.h"		#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/Linalg/Utils/Utils.h"		#include "mlir/Dialect/Linalg/Utils/Utils.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/SCF/IR/SCF.h"		#include "mlir/Dialect/SCF/IR/SCF.h"
		#include "mlir/Dialect/Tensor/IR/Tensor.h"

using namespace mlir;		using namespace mlir;
using namespace mlir::sparse_tensor;		using namespace mlir::sparse_tensor;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// File local helper functions.		// File local helper functions.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		/// Extracts a corresponding vector of type from a ValueRange.
		static SmallVector<Type> getTypesFromValues(ValueRange vs) {
		SmallVector<Type> ret;
		for (auto v : vs)
		ret.push_back(v.getType());
		return ret;
		}

/// Generates a pointer/index load from the sparse storage scheme. Narrower		/// Generates a pointer/index load from the sparse storage scheme. Narrower
/// data types need to be zero extended before casting the value into the		/// data types need to be zero extended before casting the value into the
/// index type used for looping and indexing.		/// index type used for looping and indexing.
static Value genIndexLoad(OpBuilder &builder, Location loc, Value ptr,		static Value genIndexLoad(OpBuilder &builder, Location loc, Value ptr,
Value s) {		Value s) {
		wrengrUnsubmitted Done Reply Inline Actions I'd prefer these be defined as `static inline` functions rather than as macros, since that gives better type-safety and compiler error messages. If you need to use CMPI at several different types, then just use a template. wrengr: I'd prefer these be defined as `static inline` functions rather than as macros, since that…
		PeimingAuthorUnsubmitted Done Reply Inline Actions I will stick with this. The reason I use macro is that I want to avoid typing `builder` and `loc`. Peiming: I will stick with this. The reason I use macro is that I want to avoid typing `builder` and…
// For the scalar case, we simply zero extend narrower indices into 64-bit		// For the scalar case, we simply zero extend narrower indices into 64-bit
// values before casting to index without a performance penalty. Here too,		// values before casting to index without a performance penalty. Here too,
// however, indices that already are 64-bit, in theory, cannot express the		// however, indices that already are 64-bit, in theory, cannot express the
// full range as explained above.		// full range as explained above.
Value load = builder.create<memref::LoadOp>(loc, ptr, s);		Value load = builder.create<memref::LoadOp>(loc, ptr, s);
		wrengrUnsubmitted Done Reply Inline Actions Please don't undo this variable naming. The "mem" matches other places, and avoids confusion about whether "ptr" means `MemRef` vs the old "pointer" (now "position") vs llvm-pointers vs... wrengr: Please don't undo this variable naming. The "mem" matches other places, and avoids confusion…
		PeimingAuthorUnsubmitted Done Reply Inline Actions Sry, probably it get overlooked during rebasing Peiming: Sry, probably it get overlooked during rebasing
if (!load.getType().isa<IndexType>()) {		if (!load.getType().isa<IndexType>()) {
		wrengrUnsubmitted Done Reply Inline Actions You should just use `ValueRange::getTypes()` wrengr: You should just use `ValueRange::getTypes()`
if (load.getType().getIntOrFloatBitWidth() < 64)		if (load.getType().getIntOrFloatBitWidth() < 64)
load = builder.create<arith::ExtUIOp>(loc, builder.getI64Type(), load);		load = builder.create<arith::ExtUIOp>(loc, builder.getI64Type(), load);
load =		load =
builder.create<arith::IndexCastOp>(loc, builder.getIndexType(), load);		builder.create<arith::IndexCastOp>(loc, builder.getIndexType(), load);
}		}
return load;		return load;
}		}

Show All 9 Lines	static Value genSliceStride(OpBuilder &builder, Location loc, Value tensor,
return createOrFoldSliceStrideOp(builder, loc, tensor, toOrigDim(enc, lvl));		return createOrFoldSliceStrideOp(builder, loc, tensor, toOrigDim(enc, lvl));
}		}

static Value toSliceCoord(OpBuilder &builder, Location loc, Value v,		static Value toSliceCoord(OpBuilder &builder, Location loc, Value v,
Value offset, Value stride, Value tensor,		Value offset, Value stride, Value tensor,
unsigned lvl) {		unsigned lvl) {
// iv = iv * stride + offset		// iv = iv * stride + offset
v = builder.create<arith::MulIOp>(loc, v, stride);		v = builder.create<arith::MulIOp>(loc, v, stride);
v = builder.create<arith::AddIOp>(loc, v, offset);		v = builder.create<arith::AddIOp>(loc, v, offset);
		aartbikUnsubmitted Done Reply Inline Actions This computation does not match my mental interpretation of the text above (L79) aartbik: This computation does not match my mental interpretation of the text above (L79)
return v;		return v;
}		}

static std::pair<Value, Value> fromSliceCoord(OpBuilder &builder, Location loc,		static std::pair<Value, Value> fromSliceCoord(OpBuilder &builder, Location loc,
		aartbikUnsubmitted Done Reply Inline Actions // offset adds very little, Either use a sentence or remove aartbik: // offset adds very little, Either use a sentence or remove
Value iv, Value offset,		Value iv, Value offset,
Value stride, Value tensor,		Value stride, Value tensor,
unsigned lvl) {		unsigned lvl) {
// iv = (iv - offset) / stride		// iv = (iv - offset) / stride
iv = builder.create<arith::SubIOp>(loc, iv, offset);		iv = builder.create<arith::SubIOp>(loc, iv, offset);
Value rem = builder.create<arith::RemUIOp>(loc, iv, stride);		Value rem = builder.create<arith::RemUIOp>(loc, iv, stride);
iv = builder.create<arith::DivUIOp>(loc, iv, stride);		iv = builder.create<arith::DivUIOp>(loc, iv, stride);
return std::make_pair(iv, rem);		return std::make_pair(iv, rem);
}		}

		/// Helper method the generate a tensor.extract_slice operation with the given
		/// offset and size on dim.
		static Value genExtractSliceWithOffsetOnDim(OpBuilder &builder, Location loc,
		Value src, Value dynOffset,
		Value sz, unsigned dim) {

		RankedTensorType srcTp = src.getType().cast<RankedTensorType>();
		int rank = srcTp.getRank();

		SmallVector<int64_t> offsets(rank, 0);
		SmallVector<int64_t> strides(rank, 1);
		SmallVector<int64_t> sizes(srcTp.getShape());
		SmallVector<Value> dynSizes;

		offsets[dim] = ShapedType::kDynamic;
		sizes[dim] = ShapedType::kDynamic;

		auto srcEncoding = getSparseTensorEncoding(srcTp);
		SmallVector<SparseTensorDimSliceAttr> sliceAttrs;

		for (unsigned i = 0, e = sizes.size(); i < e; i++) {
		// Infers the slice attribute array (sets offset/size to dynamic on the
		// slicing dimension).
		int offset = i == dim ? SparseTensorDimSliceAttr::kDynamic : 0;
		int size = (i == dim \|\| ShapedType::isDynamic(sizes[i]))
		? SparseTensorDimSliceAttr::kDynamic
		: sizes[i];
		sliceAttrs.push_back(SparseTensorDimSliceAttr::get(srcTp.getContext(),
		/offset=/offset,
		/size=/size,
		/stride=/1));
		if (ShapedType::isDynamic(sizes[i])) {
		if (dim == i)
		dynSizes.push_back(sz);
		else
		dynSizes.push_back(linalg::createOrFoldDimOp(builder, loc, src, dim));
		}
		}

		// Keeps original encodings but attaches slice attribute.
		auto encoding = SparseTensorEncodingAttr::get(
		srcTp.getContext(), srcEncoding.getDimLevelType(),
		srcEncoding.getDimOrdering(), srcEncoding.getHigherOrdering(),
		srcEncoding.getPointerBitWidth(), srcEncoding.getIndexBitWidth(),
		sliceAttrs);

		auto retTp = RankedTensorType::get(sizes, srcTp.getElementType(), encoding);
		return builder
		.create<tensor::ExtractSliceOp>(loc, retTp, src, ValueRange{dynOffset},
		dynSizes, ValueRange{}, offsets, sizes,
		strides)
		.getResult();
		}

std::pair<Value, Value>		std::pair<Value, Value>
LoopEmitter::genSliceLegitPredicate(OpBuilder &builder, Location loc,		LoopEmitter::genSliceLegitPredicate(OpBuilder &builder, Location loc,
Value coord, unsigned tid, unsigned lvl) {		Value coord, unsigned tid, unsigned lvl) {
assert(isSparseSlices[tid]);		assert(isSparseSlices[tid]);
Value slice = tensors[tid];		Value slice = tensors[tid];
Value offset = sliceOffsets[tid][lvl];		Value offset = sliceOffsets[tid][lvl];
Value stride = sliceStrides[tid][lvl];		Value stride = sliceStrides[tid][lvl];
auto enc = getSparseTensorEncoding(slice.getType());		auto enc = getSparseTensorEncoding(slice.getType());
Show All 36 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Sparse tensor loop emitter class implementations		// Sparse tensor loop emitter class implementations
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

Value LoopEmitter::genAddress(OpBuilder &builder, Location loc, size_t tid,		Value LoopEmitter::genAddress(OpBuilder &builder, Location loc, size_t tid,
size_t dim, Value iv) {		size_t dim, Value iv) {
Value p = dim == 0 ? constantIndex(builder, loc, 0) : pidxs[tid][dim - 1];		Value p = dim == 0 ? constantIndex(builder, loc, 0) : pidxs[tid][dim - 1];
Value mul = builder.create<arith::MulIOp>(loc, highs[tid][dim], p);		Value mul = builder.create<arith::MulIOp>(loc, highs[tid][dim], p);
if (isSparseSlices[tid])		if (isSparseSlices[tid]) {
iv = toSliceCoord(builder, loc, iv, sliceOffsets[tid][dim],		iv = toSliceCoord(builder, loc, iv, sliceOffsets[tid][dim],
sliceStrides[tid][dim], tensors[tid], dim);		sliceStrides[tid][dim], tensors[tid], dim);
		}
Value add = builder.create<arith::AddIOp>(loc, mul, iv);		Value add = builder.create<arith::AddIOp>(loc, mul, iv);
return add;		return add;
}		}

LoopEmitter::LoopEmitter(ValueRange tensors, StringAttr loopTag, bool hasOutput,		LoopEmitter::LoopEmitter(ValueRange tensors, StringAttr loopTag, bool hasOutput,
bool isSparseOut, ArrayRef<unsigned> topSort,		bool isSparseOut, ArrayRef<unsigned> topSort,
DependentDimGetter getter) {		DependentDimGetter getter) {
initialize(tensors, loopTag, hasOutput, isSparseOut, topSort, getter);		initialize(tensors, loopTag, hasOutput, isSparseOut, topSort, getter);
Show All 19 Lines	void LoopEmitter::initialize(ValueRange tensors, StringAttr loopTag,
this->ptrBuffer.assign(tensors.size(), std::vector<Value>());		this->ptrBuffer.assign(tensors.size(), std::vector<Value>());
this->idxBuffer.assign(tensors.size(), std::vector<Value>());		this->idxBuffer.assign(tensors.size(), std::vector<Value>());
this->valBuffer.assign(tensors.size(), nullptr);		this->valBuffer.assign(tensors.size(), nullptr);
this->loopStack.reserve(topSort.size());		this->loopStack.reserve(topSort.size());
this->sparsiferLoopLvlMap.assign(topSort.size(), 0);		this->sparsiferLoopLvlMap.assign(topSort.size(), 0);
this->dependentDimMap.assign(		this->dependentDimMap.assign(
tensors.size(),		tensors.size(),
std::vector<std::vector<std::pair<unsigned, unsigned>>>());		std::vector<std::vector<std::pair<unsigned, unsigned>>>());
		this->slicePtrBuffer.assign(tensors.size(),
		std::vector<std::vector<Value>>());
		this->sliceSizes.assign(tensors.size(), std::vector<std::vector<Value>>());
		this->sliceStack.assign(tensors.size(), std::vector<SliceInfo>());
		this->sliceResolvedConstraints.assign(tensors.size(), 0);

for (size_t tid = 0, e = tensors.size(); tid < e; tid++) {		for (size_t tid = 0, e = tensors.size(); tid < e; tid++) {

auto t = tensors[tid];		auto t = tensors[tid];
// a scalar or 0-dimension tensors		// a scalar or 0-dimension tensors
		aartbikUnsubmitted Done Reply Inline Actions why the empty lines here? aartbik: why the empty lines here?
if (isZeroRankedTensorOrScalar(t.getType()))		if (isZeroRankedTensorOrScalar(t.getType()))
continue;		continue;
		wrengrUnsubmitted Done Reply Inline Actions You should use the `numTensors` variable instead of calling `tensors.size()` repeatedly. (This is for code clarity rather than performance reasons) wrengr: You should use the `numTensors` variable instead of calling `tensors.size()` repeatedly. (This…
auto rtp = getRankedTensorType(t);		auto rtp = getRankedTensorType(t);
auto rank = static_cast<size_t>(rtp.getRank());		auto rank = static_cast<size_t>(rtp.getRank());
auto enc = getSparseTensorEncoding(rtp);		auto enc = getSparseTensorEncoding(rtp);
// We always treat sparse output tensor as dense so that we always iterate		// We always treat sparse output tensor as dense so that we always iterate
// it based on dim size.		// it based on dim size.
if (enc && !(isOutputTensor(tid) && isSparseOut)) {		if (enc && !(isOutputTensor(tid) && isSparseOut)) {
isSparseSlices[tid] = enc.isSlice();		isSparseSlices[tid] = enc.isSlice();
for (auto dimTp : enc.getDimLevelType())		for (auto dimTp : enc.getDimLevelType())
dimTypes[tid].push_back(dimTp);		dimTypes[tid].push_back(dimTp);
} else		} else
dimTypes[tid].assign(rank, DimLevelType::Dense);		dimTypes[tid].assign(rank, DimLevelType::Dense);

// Initialize using empty value.		// Initialize using empty value.
sliceOffsets[tid].assign(rank, Value());		sliceOffsets[tid].assign(rank, Value());
		wrengrUnsubmitted Done Reply Inline Actions Please keep this as `Level l` wrengr: Please keep this as `Level l`
		wrengrUnsubmitted Done Reply Inline Actions It would be clearer to combine these together. Also, it would be clearer to use `continue` rather than extra indentation for the conditional. Putting those together, maybe use something like `if (depends == 0) continue; assert(!reassoc); sliceSizes[...` wrengr: It would be clearer to combine these together. Also, it would be clearer to use `continue`…
		aartbikUnsubmitted Done Reply Inline Actions We need `depends - 1` slices to make sure you don't read depends as part of the sentence aartbik: We need `depends - 1` slices to make sure you don't read depends as part of the sentence
sliceStrides[tid].assign(rank, Value());		sliceStrides[tid].assign(rank, Value());
dims[tid].assign(rank, Value());		dims[tid].assign(rank, Value());
pidxs[tid].assign(rank, Value());		pidxs[tid].assign(rank, Value());
coord[tid].assign(rank, Value());		coord[tid].assign(rank, Value());
highs[tid].assign(rank, Value());		highs[tid].assign(rank, Value());
ptrBuffer[tid].assign(rank, Value());		ptrBuffer[tid].assign(rank, Value());
idxBuffer[tid].assign(rank, Value());		idxBuffer[tid].assign(rank, Value());

		// Slice-driven loops related initialization.
dependentDimMap[tid].assign(rank,		dependentDimMap[tid].assign(rank,
std::vector<std::pair<unsigned, unsigned>>());		std::vector<std::pair<unsigned, unsigned>>());
if (dimGetter)		slicePtrBuffer[tid].assign(rank, std::vector<Value>());
for (unsigned i = 0; i < rank; i++)		sliceSizes[tid].assign(rank, std::vector<Value>());
		sliceStack[tid].emplace_back(tensors[tid], /minCoord=/Value(),
		/offset=/Value(), /isNonEmpty/ Value(),
		std::nullopt, 0);
		if (dimGetter) {
		for (unsigned i = 0; i < rank; i++) {
dependentDimMap[tid][i] = dimGetter(tid, i);		dependentDimMap[tid][i] = dimGetter(tid, i);
		unsigned depends = dependentDimMap[tid][i].size();
		if (depends != 0) {
		// We need depends - 1 slices to fully resolve the affine expression.
		slicePtrBuffer[tid][i].assign(depends - 1, nullptr);
		sliceSizes[tid][i].assign(depends - 1, nullptr);
		}
		}
		}
}		}

// FIXME: This map should be maintained outside loop emitter.		// FIXME: This map should be maintained outside loop emitter.
for (unsigned i = 0, e = topSort.size(); i < e; i++) {		for (unsigned i = 0, e = topSort.size(); i < e; i++) {
// This is an inverse map of the topologically sorted loop index from		// This is an inverse map of the topologically sorted loop index from
// sparsifier. This is needed to map the AffineDimExpr back to the		// sparsifier. This is needed to map the AffineDimExpr back to the
// loopStack index used in loop emitter.		// loopStack index used in loop emitter.
sparsiferLoopLvlMap[topSort[i]] = i;		sparsiferLoopLvlMap[topSort[i]] = i;
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	if (!enc) {
// Annotated sparse tensors.		// Annotated sparse tensors.
// We also need the value buffer for annotated all dense `sparse` tensor.		// We also need the value buffer for annotated all dense `sparse` tensor.
valBuffer[t] = genToValues(builder, loc, tensor);		valBuffer[t] = genToValues(builder, loc, tensor);
}		}
// NOTE: we can also prepare for 0 dim here in advance, this will hosit		// NOTE: we can also prepare for 0 dim here in advance, this will hosit
// some loop preparation from tensor iteration, but will also		// some loop preparation from tensor iteration, but will also
// (undesirably) hosit the code ouside if conditions.		// (undesirably) hosit the code ouside if conditions.
}		}

		Type indexType = builder.getIndexType();
		Value c0 = constantZero(builder, loc, indexType);
		Value c2 = constantIndex(builder, loc, 2);
		wrengrUnsubmitted Done Reply Inline Actions Please use `TensorId` here. I am just about to upload the CLs that make that into a newtype, so it's important to use the correct type instead of just using `size_t`/`unsigned` everywhere wrengr: Please use `TensorId` here. I am just about to upload the CLs that make that into a newtype, so…
		// TODO: We should probably use integer with pointer bitwidth for the cache.
		wrengrUnsubmitted Done Reply Inline Actions Please define a `dyn_cast` variant of the `getSparseTensorType` function, and use that here and everywhere else. The `SparseTensorType` was created specifically to help avoid several code legibility and correctness concerns, so you should be using it everywhere possible. wrengr: Please define a `dyn_cast` variant of the `getSparseTensorType` function, and use that here and…
		MemRefType cacheTp = MemRefType::get({ShapedType::kDynamic}, indexType);
		// Generate caches required to fast compute next-non-empty slices with
		// increasing offset for slice-base loop.
		// We need to start a separate loop here because the cache size depends on the
		wrengrUnsubmitted Done Reply Inline Actions You should be using `SparseTensorType::getLevelRank` here, since it is specifically the level-rank you want not the dim-rank wrengr: You should be using `SparseTensorType::getLevelRank` here, since it is specifically the level…
		// dimension size computed in the aboves loops.
		wrengrUnsubmitted Done Reply Inline Actions Please use `Level` for all levels. Even though it's just a typedef for now, I will be converting it to a proper type in the near future, so you should use the correct type rather than just using `unsigned` everywhere wrengr: Please use `Level` for all levels. Even though it's just a typedef for now, I will be…
		for (size_t t = 0, e = tensors.size(); t < e; t++) {
		auto rtp = tensors[t].getType().dyn_cast<RankedTensorType>();
		aartbikUnsubmitted Done Reply Inline Actions The comment applies to the assert, but the declaration is in between aartbik: The comment applies to the assert, but the declaration is in between
		if (!rtp)
		continue;

		// for a pair of [pLo, pHi]. Note that we can not compress pHi because slice
		// creates segments in the index buffer so that the pHi for the current dim
		// is no longer the pLo for the next dim.
		Value pIdxSize = c2;
		auto rank = rtp.getRank();
		for (unsigned lvl = 0; lvl < rank; lvl++) {
		if (!dependentDimMap[t][lvl].empty()) {
		// Needs at least two operands to form a non-trivial affine expression.
		ArrayRef<std::pair<unsigned, unsigned>> dependedDim =
		dependentDimMap[t][lvl];
		assert(dependedDim.size() > 1);

		Value size = c0;
		for (unsigned e = dependedDim.size() - 1; e >= 1; e--) {
		auto [dt, dd] = dependedDim[e];
		size = builder.create<arith::AddIOp>(loc, size, dims[dt][dd]);
		sliceSizes[t][lvl][e - 1] = size;
		}

		// No cache for dense level, they can be simply increased by one.
		auto dlt = dimTypes[t][lvl];

		if (!isDenseDLT(dlt)) {
		llvm::for_each(slicePtrBuffer[t][lvl], [cacheTp, pIdxSize, c2, loc,
		&builder](Value &cache) {
		cache = builder.create<memref::AllocaOp>(
		loc, cacheTp,
		// Additional two metadata {memSize, idx} at head.
		builder.create<arith::AddIOp>(loc, pIdxSize, c2).getResult());
		});
		}

		// Accumlates the size required to cache the pLo for the slice.
		// E.g., if we want to cache the pIdx for slice<d0xd1xf64> on the second
		// level. We at most need to a memref<d0xindex>.
		// NOTE: this is apperantly an over-approximation when the previous
		// level is compressed, and we can compute a precise memory size
		// inside the loops. But that would also requires us to allocate/free
		// memorys in loops.
		// TODO: Maybe using allocaScopOp inside the loop to resolve the issue?
		if (!dependentDimMap[t][lvl].empty()) {
		auto [dt, dd] = dependentDimMap[t][lvl].back();
		pIdxSize = builder.create<arith::MulIOp>(loc, pIdxSize, dims[dt][dd]);
		} else {
		// This level does not need to be sliced, the final size of the slice
		// on the level will be the same as the current size.
		pIdxSize = builder.create<arith::MulIOp>(loc, pIdxSize, dims[t][lvl]);
		}
		}
		}
		}
}		}

void LoopEmitter::enterNewLoopSeq(OpBuilder &builder, Location loc,		void LoopEmitter::enterNewLoopSeq(OpBuilder &builder, Location loc,
ArrayRef<size_t> tids,		ArrayRef<size_t> tids,
ArrayRef<size_t> dims) {		ArrayRef<size_t> dims) {
assert(loopSeqStack.size() == loopStack.size());		assert(loopSeqStack.size() == loopStack.size());
// Universal Index starts from 0.
loopSeqStack.emplace_back(constantIndex(builder, loc, 0));
// Prepares for all the tensors used in the current loop sequence.		// Prepares for all the tensors used in the current loop sequence.
for (auto [tid, dim] : llvm::zip(tids, dims))		std::vector<std::tuple<unsigned, unsigned, bool>> slicedTids;
		for (auto [tid, dim] : llvm::zip(tids, dims)) {
		if (!dependentDimMap[tid][dim].empty()) {
		bool fullyRes = genSliceBegin(builder, loc, tid, dim);
		slicedTids.emplace_back(tid, dim, fullyRes);
		} else {
prepareLoopOverTensorAtDim(builder, loc, tid, dim);		prepareLoopOverTensorAtDim(builder, loc, tid, dim);
}		}
		}

		// Universal Index starts from 0.
		loopSeqStack.emplace_back(constantIndex(builder, loc, 0),
		std::move(slicedTids));
		}

		void LoopEmitter::exitCurrentLoopSeq(OpBuilder &builder, Location loc) {
		assert(loopSeqStack.size() == loopStack.size() + 1);
		wrengrUnsubmitted Done Reply Inline Actions I think it'd be clearer to just use `const auto &` here wrengr: I think it'd be clearer to just use `const auto &` here

		const std::vector<std::tuple<unsigned, unsigned, bool>> &slicedTids =
		aartbikUnsubmitted Done Reply Inline Actions Please elaborate. "Pop out" is not at all representative for what follows aartbik: Please elaborate. "Pop out" is not at all representative for what follows
		loopSeqStack.back().second;

		// Pop out outdated slices.
		for (auto [tid, lvl, res] : slicedTids) {
		if (!res) {
		assert(sliceStack[tid].back().slicedOnLvl == lvl);
		sliceStack[tid].pop_back();
		// There is an additional item in sliceStack for the input tensor.
		assert(sliceResolvedConstraints[tid] + 1 == sliceStack[tid].size());
		} else {
		Value c1 = constantIndex(builder, loc, 1);
		Value c2 = constantIndex(builder, loc, 2);

		// pIdx += 2, we finished the current lvl, advance the pointer index of
		// the previous level by two to skip the [pLo, pHi] for current level.
		// TODO: we could probably use an SSA value for it.
		Value sPtrBuf = slicePtrBuffer[tid][lvl].back();
		Value curP = genIndexLoad(builder, loc, sPtrBuf, c1);
		Value nexP = builder.create<arith::AddIOp>(loc, curP, c2);
		builder.create<memref::StoreOp>(loc, nexP, sPtrBuf, c1);
		}
		}
		loopSeqStack.pop_back();
		}

Value LoopEmitter::genAffine(OpBuilder &builder, AffineExpr a, Location loc) {		Value LoopEmitter::genAffine(OpBuilder &builder, AffineExpr a, Location loc) {
switch (a.getKind()) {		switch (a.getKind()) {
case AffineExprKind::DimId: {		case AffineExprKind::DimId: {
unsigned idx = a.cast<AffineDimExpr>().getPosition();		unsigned idx = a.cast<AffineDimExpr>().getPosition();
return loopStack[sparsiferLoopLvlMap[idx]].iv;		return loopStack[sparsiferLoopLvlMap[idx]].iv;
}		}
case AffineExprKind::Add: {		case AffineExprKind::Add: {
Show All 12 Lines	case AffineExprKind::Constant: {
int64_t c = a.cast<AffineConstantExpr>().getValue();		int64_t c = a.cast<AffineConstantExpr>().getValue();
return constantIndex(builder, loc, c);		return constantIndex(builder, loc, c);
}		}
default:		default:
llvm_unreachable("unexpected affine subscript");		llvm_unreachable("unexpected affine subscript");
}		}
}		}

Operation *LoopEmitter::enterLoopOverTensorAtDim(		Operation *LoopEmitter::emitForLoopOverTensorAtDim(OpBuilder &builder,
OpBuilder &builder, Location loc, ArrayRef<size_t> tids,		Location loc, size_t tid,
ArrayRef<size_t> dims, MutableArrayRef<Value> reduc, bool isParallel) {		size_t dim,
// TODO: support multiple return on parallel for?		MutableArrayRef<Value> reduc,
assert(!isParallel \|\| reduc.size() <= 1);		bool isParallel) {
bool isSparseInput = false;		bool isSparseCond =
size_t tid = tids.front(), dim = dims.front();		isCompressedDLT(dimTypes[tid][dim]) \|\| isSingletonDLT(dimTypes[tid][dim]);
for (auto [t, d] : llvm::zip(tids, dims)) {
assert(dimTypes[t].size() > d); // Must be a valid tid, dim pair
assert(!coord[t][d]); // We cannot re-enter the same level
auto dimType = dimTypes[t][d];
// Must be a recognizable DLT.
assert(isDenseDLT(dimType) \|\| isCompressedDLT(dimType) \|\|
isSingletonDLT(dimType));
bool isSparse = isCompressedDLT(dimType) \|\| isSingletonDLT(dimType);
// We can at most have one sparse input, otherwise, a while loop is
// required to co-iterate multiple sparse tensors.
assert(!isSparseInput \|\| !isSparse);
if (isSparse) {
tid = t;
dim = d;
}
isSparseInput = isSparseInput \|\| isSparse;
}

// TODO: support dynamic slices.		// TODO: support dynamic slices.
Value step = constantIndex(builder, loc, 1);		Value step = constantIndex(builder, loc, 1);
Value lo = isSparseInput ? pidxs[tid][dim] // current offset		Value lo = isSparseCond ? pidxs[tid][dim] // current offset
: loopSeqStack.back(); // universal index		: loopSeqStack.back().first; // universal index
Value hi = highs[tid][dim];		Value hi = highs[tid][dim];

Operation *loop = nullptr;		Operation *loop = nullptr;
Value iv;		Value iv;
if (isParallel) {		if (isParallel) {
scf::ParallelOp parOp =		scf::ParallelOp parOp =
builder.create<scf::ParallelOp>(loc, lo, hi, step, reduc);		builder.create<scf::ParallelOp>(loc, lo, hi, step, reduc);
builder.setInsertionPointToStart(parOp.getBody());		builder.setInsertionPointToStart(parOp.getBody());
Show All 19 Lines	if (isParallel) {
assert(forOp.getNumRegionIterArgs() == reduc.size());		assert(forOp.getNumRegionIterArgs() == reduc.size());
for (int i = 0, e = reduc.size(); i < e; i++)		for (int i = 0, e = reduc.size(); i < e; i++)
reduc[i] = forOp.getRegionIterArg(i);		reduc[i] = forOp.getRegionIterArg(i);
loop = forOp;		loop = forOp;
}		}
assert(loop && iv);		assert(loop && iv);

Value c;		Value c;
if (isSparseInput) {		if (isSparseCond) {
pidxs[tid][dim] = iv;		pidxs[tid][dim] = iv;
// Generating a load on the indices array yields the coordinate.		// Generating a load on the indices array yields the coordinate.
Value ptr = idxBuffer[tid][dim];		Value ptr = idxBuffer[tid][dim];
c = genIndexLoad(builder, loc, ptr, iv);		c = genIndexLoad(builder, loc, ptr, iv);
} else {		} else {
// Dense tensor, the coordinates is the inducation variable.		// Dense tensor, the coordinates is the inducation variable.
c = iv;		c = iv;
}		}

if (isSparseSlices[tid] && isSparseInput) {		if (isSparseSlices[tid] && isSparseCond) {
// For sparse level slices, we need to filter out invalid coordinates that		// For sparse level slices, we need to filter out invalid coordinates that
// are not included in the slice.		// are not included in the slice.
SmallVector<Type> types;		SmallVector<Type> types;
for (Value red : reduc)		for (Value red : reduc)
types.push_back(red.getType());		types.push_back(red.getType());

auto [trans, pred] = genSliceLegitPredicate(builder, loc, c, tid, dim);		auto [trans, pred] = genSliceLegitPredicate(builder, loc, c, tid, dim);
bool hasReduc = !types.empty();		bool hasReduc = !types.empty();
Show All 13 Lines	if (isSparseSlices[tid] && isSparseCond) {
}		}
// Set the insert point to matched branch.		// Set the insert point to matched branch.
builder.setInsertionPointToStart(&ifOp.getThenRegion().front());		builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
c = trans;		c = trans;
}		}

assert(c);		assert(c);
coord[tid][dim] = c;		coord[tid][dim] = c;
		return loop;
		}

		Operation *LoopEmitter::enterLoopOverTensorAtDim(
		OpBuilder &builder, Location loc, ArrayRef<size_t> tids,
		ArrayRef<size_t> dims, MutableArrayRef<Value> reduc, bool isParallel) {
		// TODO: support multiple return on parallel for?
		assert(!isParallel \|\| reduc.size() <= 1);
		bool isSparseCond = false, isSliceCond = false;
		size_t tid = tids.front(), dim = dims.front();

		for (auto [t, d] : llvm::zip(tids, dims)) {
		assert(dimTypes[t].size() > d); // Must be a valid tid, dim pair
		wrengrUnsubmitted Done Reply Inline Actions Use "l" or "lvl" here. The name "d" is reserved for things of `Dimension` type, whereas this has `Level` type. wrengr: Use "l" or "lvl" here. The name "d" is reserved for things of `Dimension` type, whereas this…
		assert(!coord[t][d] \|\| // We cannot re-enter the same level
		!dependentDimMap[t][d].empty()); // unless it is a slice-driver loop
		auto dimType = dimTypes[t][d];
		// Must be a recognizable DLT.
		assert(isDenseDLT(dimType) \|\| isCompressedDLT(dimType) \|\|
		isSingletonDLT(dimType));

		// This is a slice-driven loop.
		if (!dependentDimMap[t][d].empty()) {
		assert(!isSliceCond && !isSparseCond);
		isSliceCond = true;
		tid = t;
		dim = d;
		continue;
		}

		bool isSparse = isCompressedDLT(dimType) \|\| isSingletonDLT(dimType);
		// We can at most have one sparse input, otherwise, a while loop is
		// required to co-iterate multiple sparse tensors.
		assert(!isSparseCond \|\| !isSparse);
		assert(!isSliceCond \|\| !isSparseCond);
		if (isSparse) {
		tid = t;
		dim = d;
		}
		isSparseCond = isSparseCond \|\| isSparse;
		}

		// if the slice is fully reduced, we can now use TACO-based algorithm to
		aartbikUnsubmitted Done Reply Inline Actions If the slice aartbik: If the slice
		// iterate it.
		Operation *l = nullptr;
		aartbikUnsubmitted Done Reply Inline Actions I find this block of code extremely hard to read. Any way to factor this into slightly smaller methods and combine these? aartbik: I find this block of code extremely hard to read. Any way to factor this into slightly smaller…
		PeimingAuthorUnsubmitted Done Reply Inline Actions better now? Peiming: better now?
		aartbikUnsubmitted Done Reply Inline Actions Yes, although it could still use a bit more doc on what each block does (on entry of each block). Also, I would not overuse the "NOTE" part, in principle, all comments are NOTEs and we should only use them when something really should jump out aartbik: Yes, although it could still use a bit more doc on what each block does (on entry of each…
		if (isSliceCond) {
		bool fullyResolved = sliceFullyResolved(tid);
		if (!fullyResolved) {
		l = emitSliceDrivenLoopOverTensorAtDim(builder, loc, tid, dim, reduc);
		} else {
		const SliceInfo &info = getFinalSliceOnLvl(tid, dim);
		Value offset = info.offset;
		unsigned depth = info.depth - 1;
		Operation *insertPoint = nullptr;
		// TODO: we should generalize the method to support iteration over for
		// normal slices as well to allow early break.
		l = genSliceLvlTraverseLoop(
		builder, loc, pidxs[tid][dim], highs[tid][dim], offset, tid, dim,
		depth, reduc,
		/genYield=/false, // unaware of the yield values from user yet
		[this, tid, dim, reduc, offset,
		&insertPoint](OpBuilder &builder, Location loc, Value iv,
		MutableArrayRef<Value> innerReduc) {
		assert(innerReduc.size() == reduc.size());
		// Updates users' reduction variable inplace
		for (unsigned i = 0, e = reduc.size(); i < e; i++)
		reduc[i] = innerReduc[i];
		// Loads the coordinates.
		Value absC =
		genIndexLoad(builder, loc, idxBuffer[tid][dim], iv);

		// We need to substract the offset to get relative coordinates.
		// TODO: how to assert relC >=0 during runtime?
		insertPoint = builder.create<arith::SubIOp>(loc, absC, offset);
		pidxs[tid][dim] = iv;
		coord[tid][dim] = insertPoint->getResult(0);
		})
		.first;
		// We did not finish the loop body, reset the insertion point and delegate
		// to user.
		builder.setInsertionPointAfter(insertPoint);
		}
// NOTE: we can also prepare for next dim here in advance		// NOTE: we can also prepare for next dim here in advance
// Push the loop into stack		// Pushes the loop into stack.
loopStack.emplace_back(ArrayRef<size_t>(tid), ArrayRef<size_t>(dim), loop,		loopStack.emplace_back(
		ArrayRef<size_t>(), ArrayRef<size_t>(), ArrayRef<size_t>(tid),
		ArrayRef<size_t>(dim), ArrayRef<bool>(fullyResolved), l,
builder.getInsertionBlock(), coord[tid][dim], loopTag);		builder.getInsertionBlock(), coord[tid][dim], loopTag);
		} else {
		l = emitForLoopOverTensorAtDim(builder, loc, tid, dim, reduc, isParallel);
		// NOTE: we can also prepare for next dim here in advance
		// Pushes the loop into stack.
		loopStack.emplace_back(ArrayRef<size_t>(tid), ArrayRef<size_t>(dim),
		ArrayRef<size_t>(), ArrayRef<size_t>(),
		ArrayRef<bool>(), l, builder.getInsertionBlock(),
		coord[tid][dim], loopTag);
		}

// Emit extra locals.		// Emit extra locals.
emitExtraLocalsForTensorsAtDenseDims(builder, loc, tids, dims);		emitExtraLocalsForTensorsAtDenseDims(builder, loc, tids, dims);
		return l;
return loop;
}		}

Operation *LoopEmitter::enterFilterLoopOverTensorAtDim(		Operation *LoopEmitter::enterFilterLoopOverTensorAtDim(
OpBuilder &builder, Location loc, size_t tid, size_t dim, AffineExpr affine,		OpBuilder &builder, Location loc, size_t tid, size_t dim, AffineExpr affine,
MutableArrayRef<Value> reduc) {		MutableArrayRef<Value> reduc) {
assert(!affine.isa<AffineDimExpr>() && !isDenseDLT(dimTypes[tid][dim]));		assert(!affine.isa<AffineDimExpr>() && !isDenseDLT(dimTypes[tid][dim]));
assert(dimTypes[tid].size() > dim);		assert(dimTypes[tid].size() > dim);
// We can not re-enter the same level.		// We can not re-enter the same level.
Show All 11 Lines	Operation *LoopEmitter::enterFilterLoopOverTensorAtDim(
// index.		// index.
scf::ForOp forOp = builder.create<scf::ForOp>(loc, lo, hi, step, reduc);		scf::ForOp forOp = builder.create<scf::ForOp>(loc, lo, hi, step, reduc);

// In-place update on the reduction variable vector.		// In-place update on the reduction variable vector.
assert(forOp.getNumRegionIterArgs() == reduc.size());		assert(forOp.getNumRegionIterArgs() == reduc.size());
for (int i = 0, e = reduc.size(); i < e; i++)		for (int i = 0, e = reduc.size(); i < e; i++)
reduc[i] = forOp.getRegionIterArg(i);		reduc[i] = forOp.getRegionIterArg(i);

builder.setInsertionPointToStart(forOp.getBody());		builder.setInsertionPointToStart(forOp.getBody());
		aartbikUnsubmitted Done Reply Inline Actions isn't that always the case here? Should that not be part of the method description then? aartbik: isn't that always the case here? Should that not be part of the method description then?
Value iv = forOp.getInductionVar();		Value iv = forOp.getInductionVar();

pidxs[tid][dim] = iv;		pidxs[tid][dim] = iv;
// Generating a load on the indices array yields the coordinate.		// Generating a load on the indices array yields the coordinate.
Value ptr = idxBuffer[tid][dim];		Value ptr = idxBuffer[tid][dim];
coord[tid][dim] = genIndexLoad(builder, loc, ptr, iv);		coord[tid][dim] = genIndexLoad(builder, loc, ptr, iv);

// Generate an if condition to filter out indices that is not equal to the		// Generate an if condition to filter out indices that is not equal to the
Show All 21 Lines	if (hasReduc) {
// On mismatch.		// On mismatch.
builder.create<scf::YieldOp>(loc, reduc);		builder.create<scf::YieldOp>(loc, reduc);
}		}
// Set the insert point to matched branch.		// Set the insert point to matched branch.
builder.setInsertionPointToStart(&ifOp.getThenRegion().front());		builder.setInsertionPointToStart(&ifOp.getThenRegion().front());

// NOTE: we can also prepare for next dim here in advance		// NOTE: we can also prepare for next dim here in advance
// Push the loop into stack		// Push the loop into stack
loopStack.emplace_back(ArrayRef<size_t>(tid), ArrayRef<size_t>(dim), forOp,		loopStack.emplace_back(ArrayRef<size_t>(tid), ArrayRef<size_t>(dim),
builder.getInsertionBlock(), coord[tid][dim], nullptr);		ArrayRef<size_t>(), ArrayRef<size_t>(),
		ArrayRef<bool>(), forOp, builder.getInsertionBlock(),
		coord[tid][dim], nullptr);
return forOp;		return forOp;
}		}

void LoopEmitter::genDenseAffineAddressAtCurLevel(OpBuilder &builder,		void LoopEmitter::genDenseAffineAddressAtCurLevel(OpBuilder &builder,
Location loc, size_t tid,		Location loc, size_t tid,
size_t dim,		size_t dim,
AffineExpr affine) {		AffineExpr affine) {
Value affineV = genAffine(builder, affine, loc);		Value affineV = genAffine(builder, affine, loc);
pidxs[tid][dim] = genAddress(builder, loc, tid, dim, affineV);		pidxs[tid][dim] = genAddress(builder, loc, tid, dim, affineV);
}		}

Operation *LoopEmitter::enterCoIterationOverTensorsAtDims(		Operation *LoopEmitter::enterCoIterationOverTensorsAtDims(
		aartbikUnsubmitted Done Reply Inline Actions A note "make sure" is very ambiguous. Is that a note to self, or something that the code actively does. Much better is to use an affirmative statement aartbik: A note "make sure" is very ambiguous. Is that a note to self, or something that the code…
OpBuilder &builder, Location loc, ArrayRef<size_t> tids,		OpBuilder &builder, Location loc, ArrayRef<size_t> tids,
		aartbikUnsubmitted Done Reply Inline Actions appears first than normal tensors appears before normal tensors? aartbik: appears first than normal tensors appears before normal tensors?
ArrayRef<size_t> dims, bool needsUniv, MutableArrayRef<Value> reduc) {		ArrayRef<size_t> dims, bool needsUniv, MutableArrayRef<Value> reduc) {

		// NOTE: make sure that the slice driven tensor-related reduction variable
		// appears first than normal tensors.
assert(tids.size() == dims.size());		assert(tids.size() == dims.size());
SmallVector<Type> types;		SmallVector<Type> types;
SmallVector<Value> operands;		SmallVector<Value> operands;
// Construct the while-loop with a parameter for each index.		// Construct the while-loop with a parameter for each index.
		wrengrUnsubmitted Done Reply Inline Actions Why remove the const? It's clearer to know when local variables will never change wrengr: Why remove the const? It's clearer to know when local variables will never change
		PeimingAuthorUnsubmitted Done Reply Inline Actions rebase mistake. Peiming: rebase mistake.
Type indexType = builder.getIndexType();		Type indexType = builder.getIndexType();
		wrengrUnsubmitted Done Reply Inline Actions Please don't undo my factoring this out into a local variable. The condition is much easier to read when (1) it's all on one line, and (2) avoids repeating common expressions which forces the reader to double check if they are indeed the same or not. wrengr: Please don't undo my factoring this out into a local variable. The condition is much easier to…
		PeimingAuthorUnsubmitted Done Reply Inline Actions Okay, it is a mistake I made during rebasing. Peiming: Okay, it is a mistake I made during rebasing.
for (auto [tid, dim] : llvm::zip(tids, dims)) {		for (auto [tid, dim] : llvm::zip(tids, dims)) {
		// TODO: support coiteration with slice driven tensors.
		assert(dependentDimMap[tid][dim].empty() && "TODO: not yet implemented");
if (isCompressedDLT(dimTypes[tid][dim]) \|\|		if (isCompressedDLT(dimTypes[tid][dim]) \|\|
isSingletonDLT(dimTypes[tid][dim])) {		isSingletonDLT(dimTypes[tid][dim])) {
assert(pidxs[tid][dim]);		assert(pidxs[tid][dim]);
types.push_back(indexType);		types.push_back(indexType);
operands.push_back(pidxs[tid][dim]);		operands.push_back(pidxs[tid][dim]);
}		}
}		}
// The position where user-supplied reduction variable starts.		// The position where user-supplied reduction variable starts.
for (Value rec : reduc) {		for (Value rec : reduc) {
types.push_back(rec.getType());		types.push_back(rec.getType());
operands.push_back(rec);		operands.push_back(rec);
}		}
if (needsUniv) {		if (needsUniv) {
types.push_back(indexType);		types.push_back(indexType);
// Update universal index.		// Update universal index.
operands.push_back(loopSeqStack.back());		operands.push_back(loopSeqStack.back().first);
}		}
assert(types.size() == operands.size());		assert(types.size() == operands.size());
scf::WhileOp whileOp = builder.create<scf::WhileOp>(loc, types, operands);		scf::WhileOp whileOp = builder.create<scf::WhileOp>(loc, types, operands);

SmallVector<Location> locs(types.size(), loc);		SmallVector<Location> locs(types.size(), loc);
Block *before = builder.createBlock(&whileOp.getBefore(), {}, types, locs);		Block *before = builder.createBlock(&whileOp.getBefore(), {}, types, locs);
Block *after = builder.createBlock(&whileOp.getAfter(), {}, types, locs);		Block *after = builder.createBlock(&whileOp.getAfter(), {}, types, locs);

▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	if (!needsUniv) {
}		}
} else {		} else {
assert(!min);		assert(!min);
// Otherwise, universal index is the minimal pidx.		// Otherwise, universal index is the minimal pidx.
min = after->getArguments().back();		min = after->getArguments().back();
}		}

// Sets up the loop stack.		// Sets up the loop stack.
loopStack.emplace_back(tids, dims, whileOp, builder.getInsertionBlock(), min,		loopStack.emplace_back(tids, dims, ArrayRef<size_t>(), ArrayRef<size_t>(),
loopTag);		ArrayRef<bool>(), whileOp, builder.getInsertionBlock(),
		min, loopTag);
assert(loopStack.size() == loopSeqStack.size());		assert(loopStack.size() == loopSeqStack.size());

// Emits extra locals		// Emits extra locals
emitExtraLocalsForTensorsAtDenseDims(builder, loc, tids, dims);		emitExtraLocalsForTensorsAtDenseDims(builder, loc, tids, dims);

// Updates reduction variables		// Updates reduction variables
assert(after->getNumArguments() == o + reduc.size() + (needsUniv ? 1 : 0));		assert(after->getNumArguments() == o + reduc.size() + (needsUniv ? 1 : 0));
// In-place update on reduction variable.		// In-place update on reduction variable.
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
void LoopEmitter::exitWhileLoop(OpBuilder &builder, Location loc,		void LoopEmitter::exitWhileLoop(OpBuilder &builder, Location loc,
MutableArrayRef<Value> reduc) {		MutableArrayRef<Value> reduc) {
const LoopLevelInfo &loopInfo = loopStack.back();		const LoopLevelInfo &loopInfo = loopStack.back();
auto whileOp = llvm::cast<scf::WhileOp>(loopInfo.loop);		auto whileOp = llvm::cast<scf::WhileOp>(loopInfo.loop);
builder.setInsertionPointToEnd(loopInfo.userCodeBlock);		builder.setInsertionPointToEnd(loopInfo.userCodeBlock);
auto &dims = loopInfo.dims;		auto &dims = loopInfo.dims;
auto &tids = loopInfo.tids;		auto &tids = loopInfo.tids;
Value iv = loopInfo.iv;		Value iv = loopInfo.iv;

// Finalize the induction. Note that the induction could be performed		// Finalize the induction. Note that the induction could be performed
// in the individual if-branches to avoid re-evaluating the conditions.		// in the individual if-branches to avoid re-evaluating the conditions.
// However, that would result in a rather elaborate forest of yield		// However, that would result in a rather elaborate forest of yield
// instructions during code generation. Moreover, performing the induction		// instructions during code generation. Moreover, performing the induction
// after the if-statements more closely resembles code generated by TACO.		// after the if-statements more closely resembles code generated by TACO.
unsigned o = 0;		unsigned o = 0;
SmallVector<Value> operands;		SmallVector<Value> operands;
		unsigned delta = 0;
		for (auto [tid, dim, resolved] : llvm::zip(
		loopInfo.slicedTids, loopInfo.slicedDims, loopInfo.sliceResolved)) {
		if (!resolved) {
		genSliceNextInduction(builder, loc, whileOp, tid, dim, operands, o);
		sliceResolvedConstraints[tid]--;
		} else {
		// TODO: We need to distinguish coiterate loop with slice-driven loop and
		// fully reduced while op for iterating one slices.
		// since we didn't implement coiteration, this must be iteration just
		// on fully resolved slice.
		assert(loopInfo.slicedTids.size() == 1 && loopInfo.tids.empty());
		// The if guard to filter out out-range coordinates.
		assert(llvm::isa<scf::IfOp>(builder.getInsertionBlock()->getParentOp()));
		pidxs[tid][dim] = whileOp->getResult(o++);
		// FIXME: we are not using continue here since we do not support
		// coiteration on slices. But it need to be treated similarly as the
		// universal index.
		o++; // skip continue flag.
		// Since we did not push two results from whileOp. The size of the
		// operands vector is smaller than the actual number of return values from
		// the whileOp.
		// It is because we are actually generate yield in the IfOp inside the
		// whileOp to only iterates over inbound coordinates within the slices.
		delta += 2;
		}
		};

Value one = constantIndex(builder, loc, 1);		Value one = constantIndex(builder, loc, 1);
for (auto [tid, dim] : llvm::zip(tids, dims)) {		for (auto [tid, dim] : llvm::zip(tids, dims)) {
if (isCompressedDLT(dimTypes[tid][dim]) \|\|		if (isCompressedDLT(dimTypes[tid][dim]) \|\|
isSingletonDLT(dimTypes[tid][dim])) {		isSingletonDLT(dimTypes[tid][dim])) {
Value op1 = coord[tid][dim];		Value op1 = coord[tid][dim];
Value op3 = pidxs[tid][dim];		Value op3 = pidxs[tid][dim];
Value cmp =		Value cmp =
		wrengrUnsubmitted Done Reply Inline Actions It would be clearer to use `break` in the then-branch. That keeps you from needing to indent the else-branch (which is very long), and helps the reader avoid needing to check to see if there's something else after the else-branch. wrengr: It would be clearer to use `break` in the then-branch. That keeps you from needing to indent…
		PeimingAuthorUnsubmitted Done Reply Inline Actions I found `else` is easier to follow, because the control flow is more straight forward. Peiming: I found `else` is easier to follow, because the control flow is more straight forward.
		aartbikUnsubmitted Done Reply Inline Actions I agree the else is very long and deep Why not if (!resolved) { genSlice continue; } .... aartbik: I agree the else is very long and deep Why not if (!resolved) { genSlice continue; } ....
builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::eq, op1, iv);		builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::eq, op1, iv);
Value add = builder.create<arith::AddIOp>(loc, op3, one);		Value add = builder.create<arith::AddIOp>(loc, op3, one);
operands.push_back(builder.create<arith::SelectOp>(loc, cmp, add, op3));		operands.push_back(builder.create<arith::SelectOp>(loc, cmp, add, op3));
// Following loops continue iteration from the break point of the		// Following loops continue iteration from the break point of the
// current while loop.		// current while loop.
pidxs[tid][dim] = whileOp->getResult(o++);		pidxs[tid][dim] = whileOp->getResult(o++);
// The coordinates are invalid now.		// The coordinates are invalid now.
coord[tid][dim] = nullptr;		coord[tid][dim] = nullptr;
// highs remains unchanged.		// highs remains unchanged.
}		}
}		}

// Reduction value from users.		// Reduction value from users.
for (auto &i : reduc) {		for (auto &i : reduc) {
operands.push_back(i);		operands.push_back(i);
// In place update reduction variable.		// In place update reduction variable.
i = whileOp->getResult(o++);		i = whileOp->getResult(o++);
}		}

// An (optional) universal index.		// An (optional) universal index.
if (operands.size() < whileOp.getNumResults()) {		if (operands.size() + delta < whileOp.getNumResults()) {
assert(operands.size() + 1 == whileOp.getNumResults());		assert(operands.size() + delta + 1 == whileOp.getNumResults());
// The last one is the universial index.		// The last one is the universial index.
operands.push_back(builder.create<arith::AddIOp>(loc, iv, one));		operands.push_back(builder.create<arith::AddIOp>(loc, iv, one));
// update the loop starting point of current loop sequence		// update the loop starting point of current loop sequence
loopSeqStack.back() = whileOp->getResult(o++);		loopSeqStack.back().first = whileOp->getResult(o++);
}		}

assert(o == operands.size());		assert(o == operands.size() + delta);
builder.create<scf::YieldOp>(loc, operands);		builder.create<scf::YieldOp>(loc, operands);
builder.setInsertionPointAfter(whileOp);		builder.setInsertionPointAfter(whileOp);
}		}

void LoopEmitter::exitCurrentLoop(RewriterBase &rewriter, Location loc,		void LoopEmitter::exitCurrentLoop(RewriterBase &rewriter, Location loc,
MutableArrayRef<Value> reduc) {		MutableArrayRef<Value> reduc) {
// Clean up the values, it would help use to discover potential bug at a		// Clean up the values, it would help use to discover potential bug at a
// earlier stage (instead of silently using a wrong value).		// earlier stage (instead of silently using a wrong value).
LoopLevelInfo &loopInfo = loopStack.back();		LoopLevelInfo &loopInfo = loopStack.back();
assert(loopInfo.tids.size() == loopInfo.dims.size());		assert(loopInfo.tids.size() == loopInfo.dims.size());
SmallVector<Value> red;		SmallVector<Value> red;
if (llvm::isa<scf::WhileOp>(loopInfo.loop)) {		if (llvm::isa<scf::WhileOp>(loopInfo.loop)) {
exitWhileLoop(rewriter, loc, reduc);		exitWhileLoop(rewriter, loc, reduc);
} else {		} else {
exitForLoop(rewriter, loc, reduc);		exitForLoop(rewriter, loc, reduc);
}		}

assert(loopStack.size() == loopSeqStack.size());		assert(loopStack.size() == loopSeqStack.size());
loopStack.pop_back();		loopStack.pop_back();
}		}

		//===----------------------------------------------------------------------===//
		// Slice-driven loop related methods.
		aartbikUnsubmitted Done Reply Inline Actions Ok, this block is where all the magic happens ;-) I need to do one more careful pass over this... aartbik: Ok, this block is where all the magic happens ;-) I need to do one more careful pass over this..
		//===----------------------------------------------------------------------===//

		LoopEmitter::SliceInfo &LoopEmitter::getFinalSliceOnLvl(size_t tid,
		aartbikUnsubmitted Done Reply Inline Actions I think this still needs some work to make reading the block easier. The problem is that you have very concise comments in the header (Generates .....), which is okay, since i don't want to see more there, but very few comments here, where it matters. So I would still give every implementation function here an entry comment, but one that shows what is generated, using some pseudo-code of the output That way, on entry of each method, I know what to expect, and dive into the various blocks with more pre-knowledge on what they do WDYT? aartbik: I think this still needs some work to make reading the block easier. The problem is that you…
		size_t lvl) {
		for (auto it = sliceStack[tid].rbegin(), ie = sliceStack[tid].rend(); it < ie;
		it++) {
		if (it->slicedOnLvl == lvl) {
		assert(it->depth == dependentDimMap[tid][lvl].size() - 1);
		return *it;
		}
		}

		llvm_unreachable("Failed to find sliceInfo");
		}

		size_t LoopEmitter::sliceTotalConstraints(size_t tid) {
		aartbikUnsubmitted Done Reply Inline Actions this one seems out of place (all others generate stuff) perhaps move it up or down in the method order (also in header) aartbik: this one seems out of place (all others generate stuff) perhaps move it up or down in the…
		size_t numConstraints = 0;
		for (const auto &lvlDeps : dependentDimMap[tid]) {
		if (!lvlDeps.empty()) {
		assert(lvlDeps.size() >= 2);
		numConstraints += lvlDeps.size() - 1;
		}
		}
		return numConstraints;
		}

		bool LoopEmitter::sliceFullyResolved(size_t tid) {
		return sliceTotalConstraints(tid) == sliceResolvedConstraints[tid];
		}

		std::pair<Operation *, ValueRange> LoopEmitter::genSliceLvlTraverseLoop(
		OpBuilder &builder, Location loc, Value loopLo, Value loopHi, Value offset,
		size_t tid, size_t lvl, size_t depth, ValueRange userReduc, bool genYield,
		llvm::function_ref<void(OpBuilder &, Location, Value,
		MutableArrayRef<Value>)>
		bodyBuilder) {
		Value c1 = constantIndex(builder, loc, 1);
		Value sliceHi =
		builder.create<arith::AddIOp>(loc, offset, sliceSizes[tid][lvl].back());

		SmallVector<Value> reduc = {
		loopLo, // loop lower bounds
		constantI1(builder, loc, true), // continue
		};
		// Append user required reduction value.
		reduc.append(userReduc.begin(), userReduc.end());
		SmallVector<Type> types = getTypesFromValues(reduc);

		scf::WhileOp whileOp = builder.create<scf::WhileOp>(
		loc, types, reduc,
		/beforeBuilder=/
		[loopHi](OpBuilder &builder, Location loc, ValueRange args) {
		Value lo = args[0];
		Value cont = args[1];
		Value inBound = builder.create<arith::CmpIOp>(
		loc, arith::CmpIPredicate::ult, lo, loopHi);
		Value cond = builder.create<arith::AndIOp>(loc, cont, inBound);
		// continue if not yet break nor out of bound.
		builder.create<scf::ConditionOp>(loc, cond, args);
		},
		/afterBuilder=/
		[this, c1, tid, lvl, sliceHi, genYield,
		bodyBuilder](OpBuilder &builder, Location loc, ValueRange args) {
		Value iv = args[0];
		Value coord = genIndexLoad(builder, loc, idxBuffer[tid][lvl], iv);
		// If coord < sliceHi
		Value cont = builder.create<arith::CmpIOp>(
		loc, arith::CmpIPredicate::ult, coord, sliceHi);

		SmallVector<Type> types = getTypesFromValues(args.drop_front(2));
		auto ifOp = builder.create<scf::IfOp>(loc, types, cont, true);
		{
		// 2 reduction variable maintained by us.
		aartbikUnsubmitted Done Reply Inline Actions here and a few other place, no period at end, please make once last pass over all new comments here aartbik: here and a few other place, no period at end, please make once last pass over all new comments…
		SmallVector<Value> ifRet = args.drop_front(2);
		assert(ifRet.size() == args.size() - 2);

		OpBuilder::InsertionGuard guard(builder);
		// If not in slice.
		// Break the while loop (by setting continue to false)
		builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
		builder.create<scf::YieldOp>(loc, ifRet);

		// If this is a legit coordinates in slice
		builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
		bodyBuilder(builder, loc, iv, ifRet);
		if (genYield) {
		builder.setInsertionPointToEnd(&ifOp.getThenRegion().front());
		builder.create<scf::YieldOp>(loc, ifRet);
		}
		}
		// Marks this speical ifOp to avoid sparisification finalizing it.
		ifOp->setAttr(getLoopEmitterLoopAttrName(),
		StringAttr::get(builder.getContext(), "slice"));
		// Insertion point restored to after ifOp.
		SmallVector<Value> yields;
		// Increase induction variable.
		yields.push_back(builder.create<arith::AddIOp>(loc, iv, c1));
		yields.push_back(cont);
		yields.append(ifOp.getResults().begin(), ifOp.getResults().end());
		builder.create<scf::YieldOp>(loc, yields);
		});

		builder.setInsertionPointAfter(whileOp);
		return std::make_pair(whileOp, whileOp.getResults().drop_front(2));
		}

		ValueRange LoopEmitter::genSliceAllLvlTraverseLoop(
		OpBuilder &builder, Location loc, Value offset, size_t tid, size_t lvl,
		size_t depth, ValueRange userReduc,
		llvm::function_ref<void(OpBuilder &, Location, Value,
		MutableArrayRef<Value>)>
		bodyBuilder) {

		Value c0 = constantIndex(builder, loc, 0);
		Value c1 = constantIndex(builder, loc, 1);
		Value c2 = constantIndex(builder, loc, 2);

		// TODO: it only works on all compressed tensor.
		Value sPtrBuf = slicePtrBuffer[tid][lvl][depth];
		Value pSt = c2; // pointer starting index
		Value mSz = genIndexLoad(builder, loc, sPtrBuf, c0); // memSize

		auto forOp =
		scf::buildLoopNest(
		builder, loc, pSt, mSz, c2, userReduc,
		[this, c1, depth, tid, lvl, offset, sPtrBuf,
		bodyBuilder](OpBuilder &builder, Location loc, ValueRange ivs,
		ValueRange iterArgs) -> scf::ValueVector {
		// generate traversal for each level.
		Value loopLo = genIndexLoad(builder, loc, sPtrBuf, ivs.front());
		Value loopHi = genIndexLoad(
		builder, loc, sPtrBuf,
		builder.create<arith::AddIOp>(loc, ivs.front(), c1));
		return genSliceLvlTraverseLoop(builder, loc, loopLo, loopHi, offset,
		tid, lvl, depth, iterArgs, true,
		bodyBuilder)
		.second;
		})
		.loops.front();

		// Insert after current while operation.
		builder.setInsertionPointAfter(forOp);
		return forOp.getResults();
		}

		bool LoopEmitter::genSliceBegin(OpBuilder &builder, Location loc, size_t tid,
		size_t lvl) {

		Value c0 = constantIndex(builder, loc, 0);
		Value c1 = constantIndex(builder, loc, 1);
		Value c2 = constantIndex(builder, loc, 2);
		Value c3 = constantIndex(builder, loc, 3);
		Value c4 = constantIndex(builder, loc, 4);

		if (sliceFullyResolved(tid)) {
		// If constraints on the tensor is fully resolved. We do not need to
		// generates slice begin any more, instead we fall back to TACO-based
		// algorithm to (co)iterates over the slice.
		Value pLoPtr =
		genIndexLoad(builder, loc, slicePtrBuffer[tid][lvl].back(), c1);
		pLoPtr = builder.create<arith::AddIOp>(loc, pLoPtr, c2);
		Value pHiPtr = builder.create<arith::AddIOp>(loc, pLoPtr, c1);
		pidxs[tid][lvl] =
		genIndexLoad(builder, loc, slicePtrBuffer[tid][lvl].back(), pLoPtr);
		highs[tid][lvl] =
		genIndexLoad(builder, loc, slicePtrBuffer[tid][lvl].back(), pHiPtr);
		return true;
		}

		// Only when the level is sorted, the next-non-empty slice can be computed
		// efficiently.
		assert(isOrderedDLT(dimTypes[tid][lvl]));
		if (isDenseDLT(dimTypes[tid][lvl]) \|\| isSingletonDLT(dimTypes[tid][lvl]))
		llvm_unreachable("TODO: dense level should be easy to support, while "
		"singleton level requres more efforts");

		assert(!dependentDimMap[tid][lvl].empty());
		assert(!sliceStack[tid].empty());

		const SliceInfo &sliceInfo = sliceStack[tid].back();
		auto baseEnc = getSparseTensorEncoding(sliceInfo.baseSlice.getType());

		Value size, minCoord, isNonEmpty;
		unsigned depth = 0;
		if (sliceInfo.isInitialTensor()) {
		// The input tensor is slices, not yet handled.
		if (baseEnc.isSlice())
		llvm_unreachable("TODO: not yet implemented");

		assert(lvl == 0); // must be reduing the affine expression on the first lvl.
		// Fills out pIdxBuffer[tid][lvl][0] with [/memSize =/4, 0, 0, pHi]
		Value sPtrBuf = slicePtrBuffer[tid][0][0];
		Value pHi = genIndexLoad(builder, loc, ptrBuffer[tid][0], c1);
		builder.create<memref::StoreOp>(loc, c4, sPtrBuf, c0); // memSize = 4
		builder.create<memref::StoreOp>(loc, c0, sPtrBuf, c1); // index = 0
		builder.create<memref::StoreOp>(loc, c0, sPtrBuf, c2); // pLo = 0;
		builder.create<memref::StoreOp>(loc, pHi, sPtrBuf, c3); // loaded pHi.

		size = sliceSizes[tid][0][0];
		// This is an non empty tensor if 0 < pHi.
		isNonEmpty =
		builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::ult, c0, pHi);
		// The minimal coord must be at the first on ordered level.
		// FIXME: Technically we should load the coord only when the slice is
		// nonempty. though we assume that even on empty sparse tensors, a non-empty
		// ptr/idx buffer is allocated for each level so it would not cause OOB to
		// avoid generating a ifOp here.
		minCoord = genIndexLoad(builder, loc, idxBuffer[tid][0], c0);
		depth = 1;
		} else {
		unsigned prevLvl = *sliceInfo.slicedOnLvl;
		assert(lvl >= prevLvl);
		if (lvl != prevLvl + 1) {
		// Either lvl = prevSlicedLvl, i.e., t[d0 + d1 + d2,...] (more than one
		// variable need to be reduced on the same level).
		// Or lvl > prevSliceLvl + 1, i.e., t[..., d2, d3 + d4] (having a
		// simple dim expression in between).
		llvm_unreachable("TODO: not yet implemented");
		} else {
		assert(slicePtrBuffer[tid][prevLvl].size() == sliceInfo.depth);
		Value sPtrBuf = slicePtrBuffer[tid][lvl][0];

		SmallVector<Value, 3> reduc = {
		constantI1(builder, loc, false), // isNonEmpty
		dims[tid][lvl], // minCoord
		c2, // memSize
		};
		ValueRange result = genSliceAllLvlTraverseLoop(
		builder, loc, sliceInfo.offset, tid, prevLvl, sliceInfo.depth - 1,
		reduc,
		[this, c1, c2, tid, lvl, sPtrBuf](OpBuilder &builder, Location loc,
		Value iv,
		MutableArrayRef<Value> reduc) {
		Value &isNonEmpty = reduc[0];
		Value &minCoord = reduc[1];
		Value &curMemSize = reduc[2];

		Value pHi = builder.create<arith::AddIOp>(loc, iv, c1);
		Value sPLo = genIndexLoad(builder, loc, ptrBuffer[tid][lvl], iv);
		Value sPHi = genIndexLoad(builder, loc, ptrBuffer[tid][lvl], pHi);

		// isNonEmpty = isNonEmpty \|\| lvlNonEmpty
		Value lvlNonEmpty = builder.create<arith::CmpIOp>(
		loc, arith::CmpIPredicate::ult, sPLo, sPHi);
		isNonEmpty =
		builder.create<arith::OrIOp>(loc, lvlNonEmpty, isNonEmpty);

		// Update minimal coordinate.
		auto ifNonEmpty = builder.create<scf::IfOp>(
		loc, builder.getIndexType(), lvlNonEmpty, true);
		{
		OpBuilder::InsertionGuard guard(builder);
		builder.setInsertionPointToStart(ifNonEmpty.thenBlock());
		Value curC =
		genIndexLoad(builder, loc, idxBuffer[tid][lvl], sPLo);
		Value isCurSmaller = builder.create<arith::CmpIOp>(
		aartbikUnsubmitted Not Done Reply Inline Actions I first though this was commented out code ;-) So make it Generate: code aartbik: I first though this was commented out code ;-) So make it Generate: code
		loc, arith::CmpIPredicate::ult, curC, minCoord);
		Value newMin = builder.create<arith::SelectOp>(loc, isCurSmaller,
		curC, minCoord);
		builder.create<scf::YieldOp>(loc, newMin);
		builder.setInsertionPointToStart(ifNonEmpty.elseBlock());
		builder.create<scf::YieldOp>(loc, minCoord);
		}
		minCoord = ifNonEmpty.getResult(0);

		// filles in
		builder.create<memref::StoreOp>(loc, sPLo, sPtrBuf, curMemSize);
		Value nxtMemSize =
		builder.create<arith::AddIOp>(loc, curMemSize, c1);
		builder.create<memref::StoreOp>(loc, sPHi, sPtrBuf, nxtMemSize);

		// curMemSize += 2
		curMemSize = builder.create<arith::AddIOp>(loc, curMemSize, c2);
		});

		size = sliceSizes[tid][lvl][0];
		isNonEmpty = result[0];
		minCoord = result[1];
		depth = 1;

		// Two metadata [memSize, idx].
		// TODO: we might be able to use an SSA value for memSize here to avoid
		// memory operation.
		builder.create<memref::StoreOp>(loc, result[2], sPtrBuf, c0);
		builder.create<memref::StoreOp>(loc, c0, sPtrBuf, c1);
		}
		}

		assert(depth > 0 && size && isNonEmpty && minCoord && depth);
		// Compute the minimal offsets viable for a non empty tensor.
		// offset = isNonEmpty && minCoord >= size ? minCoord - size + 1 : 0;
		// NOTE: that minCoord is invalid when isNonEmpty = false, in which case
		// the computed slices are meaningless.
		// FIXME: support relative offset compute.
		Value geSize = builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::uge,
		minCoord, size);
		Value pred = builder.create<arith::AndIOp>(loc, isNonEmpty, geSize);

		Value mp1 = builder.create<arith::AddIOp>(loc, minCoord, c1);
		Value mms = builder.create<arith::SubIOp>(loc, mp1, size);
		// This is the absolute offset related to the underly tensor.
		Value absOffset = builder.create<arith::SelectOp>(loc, pred, mms, c0);
		// This is the relative offset related to the base slice.
		Value relOffset = absOffset;
		uint64_t dim = toOrigDim(baseEnc, lvl);
		Value newSlice = genExtractSliceWithOffsetOnDim(
		builder, loc, sliceInfo.baseSlice, relOffset, size, dim);
		sliceStack[tid].emplace_back(newSlice, minCoord, absOffset, isNonEmpty, lvl,
		depth);
		return false;
		}

		void LoopEmitter::genSliceNextInduction(OpBuilder &builder, Location loc,
		const Operation *op, size_t tid,
		size_t lvl,
		SmallVectorImpl<Value> &operands,
		unsigned &retIdx) {
		if (!isCompressedDLT(dimTypes[tid][lvl]))
		llvm_unreachable("TODO");

		// else generate code to compute next non empty slice.
		Value c0 = constantIndex(builder, loc, 0);
		Value c1 = constantIndex(builder, loc, 1);
		Value c2 = constantIndex(builder, loc, 2);

		auto whileOp = llvm::cast<scf::WhileOp>(op);
		SliceInfo &info = sliceStack[tid].back();
		assert(info.slicedOnLvl == lvl);

		//
		// We forward to the next non empty slice by
		// if (minCoord > offset) {
		// offset += 1
		// } else {
		// minCoord = nextMinInSlice();
		// offset = minCoord - size + 1;
		// }
		//
		// if (offset + size > parents.size)
		// isNonEmpty = false;
		//
		Value absOffset = info.offset;
		// Resets slices pointers as the resolved slices are invalidated after we
		// moves forward to the next slice.
		for (unsigned i = 0; i <= lvl; i++)
		builder.create<memref::StoreOp>(loc, c0, slicePtrBuffer[tid][i].back(), c1);

		SmallVector<Value, 3> reduc = {info.minCoord, info.isNonEmpty, absOffset};
		SmallVector<Type, 3> types = getTypesFromValues(reduc);
		Value sPtrBuf = slicePtrBuffer[tid][lvl][info.depth - 1];
		Value fastPathP = builder.create<arith::CmpIOp>(
		loc, arith::CmpIPredicate::ugt, info.minCoord, absOffset);
		auto ifOp = builder.create<scf::IfOp>(loc, types, fastPathP, true);
		{
		OpBuilder::InsertionGuard guard(builder);
		// Take the fast path if minCoord > offset
		builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
		reduc[2] = builder.create<arith::AddIOp>(loc, absOffset, c1);
		// Yield offset + 1.
		builder.create<scf::YieldOp>(loc, reduc);

		// Else, take the slow path.
		builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
		reduc[2] = absOffset; // restore value.
		Value pSt = c2; // pointer starting index
		Value mSz = genIndexLoad(builder, loc, sPtrBuf, c0); // memSize
		reduc[0] = dims[tid][lvl]; // next min coord
		reduc[1] = constantI1(builder, loc, false); // isNonEmpty
		auto loopArgs = static_cast<ValueRange>(reduc).drop_back();
		auto forOp = scf::buildLoopNest(
		builder, loc, pSt, mSz, c2, loopArgs,
		[this, tid, lvl, c1, sPtrBuf,
		&info](OpBuilder &builder, Location loc, ValueRange ivs,
		ValueRange iterArgs) -> scf::ValueVector {
		Value curMinCoord = iterArgs[0];
		Value isNonEmpty = iterArgs[1];

		Type idxTp = builder.getIndexType();
		Value pLo = genIndexLoad(builder, loc, sPtrBuf, ivs.front());
		Value pHi =
		genIndexLoad(builder, loc, sPtrBuf,
		builder.create<arith::AddIOp>(loc, ivs.front(), c1));
		//
		// if pLo < pHi
		// coord = load[pLo]
		// if coord == minCoord
		// pLo += 1
		//
		// if pLo < pHi
		// curMinCoord = min(curMinCoord, load[pLo])
		//
		Value pred = builder.create<arith::CmpIOp>(
		loc, arith::CmpIPredicate::ult, pLo, pHi);
		auto advPLo = builder.create<scf::IfOp>(loc, idxTp, pred, true);
		/* if pLo < pHi */ {
		builder.setInsertionPointToStart(&advPLo.getThenRegion().front());
		// coord = load[pLo]
		Value coord = genIndexLoad(builder, loc, idxBuffer[tid][lvl], pLo);
		Value pred = builder.create<arith::CmpIOp>(
		loc, arith::CmpIPredicate::eq, coord, info.minCoord);
		auto ifEqual = builder.create<scf::IfOp>(loc, idxTp, pred, true);
		/* if coord == minCoord */ {
		builder.setInsertionPointToStart(
		&ifEqual.getThenRegion().front());
		Value newPlo = builder.create<arith::AddIOp>(loc, pLo, c1);
		// Updates the cache.
		builder.create<memref::StoreOp>(loc, newPlo, sPtrBuf,
		ivs.front());
		builder.create<scf::YieldOp>(loc, newPlo);
		}
		/* else coord != minCoord */ {
		builder.setInsertionPointToStart(
		&ifEqual.getElseRegion().front());
		builder.create<scf::YieldOp>(loc, pLo);
		}
		builder.setInsertionPointAfter(ifEqual);
		builder.create<scf::YieldOp>(loc, ifEqual.getResults());
		}
		/* else pLo >= pHi */ {
		builder.setInsertionPointToStart(&advPLo.getElseRegion().front());
		builder.create<scf::YieldOp>(loc, pLo);
		}

		builder.setInsertionPointAfter(advPLo);
		pLo = advPLo.getResult(0);
		Value lvlNonEmpty = builder.create<arith::CmpIOp>(
		loc, arith::CmpIPredicate::ult, pLo, pHi);
		// Update minCoords
		auto newMin =
		builder.create<scf::IfOp>(loc, idxTp, lvlNonEmpty, true);
		builder.setInsertionPointToStart(&newMin.getThenRegion().front());
		builder.create<scf::YieldOp>(
		loc, genIndexLoad(builder, loc, idxBuffer[tid][lvl], pLo));

		builder.setInsertionPointToStart(&newMin.getElseRegion().front());
		builder.create<scf::YieldOp>(loc, curMinCoord);
		builder.setInsertionPointAfter(newMin);

		// isNonEmpty = isNonEmpty \|\| lvlNonEmpty
		isNonEmpty =
		builder.create<arith::OrIOp>(loc, lvlNonEmpty, isNonEmpty);
		curMinCoord = builder.create<arith::SelectOp>(
		loc,
		builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::ult,
		newMin.getResult(0), curMinCoord),
		newMin.getResult(0), curMinCoord);
		return {curMinCoord, isNonEmpty};
		});

		builder.setInsertionPointAfter(forOp.loops.front());
		// minOffset = minCoord + 1 >= size ? minCoord + 1 - size : c0
		Value tmp = builder.create<arith::AddIOp>(loc, forOp.results.front(), c1);
		Value minOffset = builder.create<arith::SubIOp>(
		loc, tmp, sliceSizes[tid][lvl][info.depth - 1]);
		Value p =
		builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::uge, tmp,
		sliceSizes[tid][lvl][info.depth - 1]);
		minOffset = builder.create<arith::SelectOp>(loc, p, minOffset, c0);
		SmallVector<Value, 3> yields;
		yields.assign(forOp.results.begin(), forOp.results.end());
		yields.push_back(minOffset);
		builder.create<scf::YieldOp>(loc, yields);
		}

		Value nextMinCoord = ifOp.getResults()[0];
		//// builder.create<vector::PrintOp>(loc, nextMinCoord);
		Value nextNonEmpty = ifOp.getResults()[1];

		// the next offset should at least be offset + 1;
		Value minOffset = ifOp.getResults()[2];
		Value nxOffset = builder.create<arith::AddIOp>(loc, info.offset, c1);
		aartbikUnsubmitted Done Reply Inline Actions The next aartbik: The next
		Value maxPred = builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::ugt,
		minOffset, nxOffset);
		Value nextAbsOffset =
		builder.create<arith::SelectOp>(loc, maxPred, minOffset, nxOffset);

		Value sliceUB = builder.create<arith::AddIOp>(
		loc, nextAbsOffset, sliceSizes[tid][lvl][info.depth - 1]);

		// FIXME: this only works if the parsent is the tensor, we should use the
		// parents slice size + parent offset.
		assert(info.depth - 1 == 0);
		// nextNonEmpty = nextNonEmpty && slice upper bound <= parent upperbound.
		nextNonEmpty = builder.create<arith::AndIOp>(
		loc, nextNonEmpty,
		builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::ule, sliceUB,
		dims[tid][lvl]));
		// FIXME: compute relative offset.
		assert(info.depth - 1 == 0);
		Value nextRelOffset = nextAbsOffset;
		nextRelOffset =
		builder.create<arith::SelectOp>(loc, nextNonEmpty, nextRelOffset, c0);

		uint64_t dim =
		toOrigDim(getSparseTensorEncoding(tensors[tid].getType()), lvl);

		Value nextSlice = genExtractSliceWithOffsetOnDim(
		builder, loc, sliceStack[tid][sliceStack.size() - 2].baseSlice,
		nextRelOffset, sliceSizes[tid][lvl][info.depth - 1], dim);

		operands.push_back(nextNonEmpty);
		operands.push_back(nextSlice);
		operands.push_back(nextMinCoord);
		operands.push_back(nextAbsOffset); // we push the absolute offset.

		// Update the slice stack.
		info.isNonEmpty = whileOp.getResult(retIdx++);
		info.baseSlice = whileOp.getResult(retIdx++);
		info.minCoord = whileOp.getResult(retIdx++);
		info.offset = whileOp.getResult(retIdx++);
		}

		Operation *LoopEmitter::emitSliceDrivenLoopOverTensorAtDim(
		OpBuilder &builder, Location loc, size_t tid, size_t lvl,
		MutableArrayRef<Value> reduc) {
		assert(!sliceFullyResolved(tid));
		SliceInfo &sliceInfo = sliceStack[tid].back();
		assert(sliceInfo.slicedOnLvl == lvl);

		// NOTE: The order matters!
		constexpr size_t numMetaReduc = 4; // number of reduction maintained by us.
		SmallVector<Value> operands{sliceInfo.isNonEmpty, sliceInfo.baseSlice,
		sliceInfo.minCoord, sliceInfo.offset};
		// Append user-required reduction values.
		operands.append(reduc.begin(), reduc.end());
		assert(operands.size() == numMetaReduc + reduc.size());

		SmallVector<Type> types = getTypesFromValues(operands);

		auto whileOp = builder.create<scf::WhileOp>(
		loc, types, operands,
		/beforeBuilder=/
		[](OpBuilder &builder, Location loc, ValueRange args) {
		builder.create<scf::ConditionOp>(loc, /isNonEmpty/ args[0], args);
		},
		/afterBuilder=/
		[this, tid, lvl, reduc, &sliceInfo](OpBuilder &builder, Location loc,
		ValueRange args) {
		assert(args.size() == reduc.size() + numMetaReduc);
		sliceInfo.isNonEmpty = args[0];
		sliceInfo.baseSlice = args[1];
		sliceInfo.minCoord = args[2];
		sliceInfo.offset = args[3];
		// The slice offset is the coordinate.
		Value c = sliceInfo.offset;
		if (sliceInfo.depth > 1) {
		// Coord is the relative offset related to its parents.
		// Update c = absOffset[lvl][depth] - absOffset[lvl][depth - 1]
		llvm_unreachable("TODO: not yet implement");
		aartbikUnsubmitted Done Reply Inline Actions Sets (or Increment), but use one style aartbik: Sets (or Increment), but use one style
		}
		coord[tid][lvl] = c;

		for (unsigned i = 0, e = reduc.size(); i < e; i++)
		reduc[i] = args[i + numMetaReduc];
		});

		// Increments the number of resolved constraints on tid.
		sliceResolvedConstraints[tid]++;
		// Set the insertion point to while loop body.
		builder.setInsertionPointToEnd(&whileOp.getAfter().front());
		return whileOp;
		}

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorRewriting.cpp

Show First 20 Lines • Show All 991 Lines • ▼ Show 20 Lines	if (!reducValue.empty()) {
// merge the block before the yield op.		// merge the block before the yield op.
rewriter.mergeBlockBefore(srcBlock, &*rewriter.getInsertionPoint(), args);		rewriter.mergeBlockBefore(srcBlock, &*rewriter.getInsertionPoint(), args);
}		}

for (int64_t i = 0; i < rank; i++) {		for (int64_t i = 0; i < rank; i++) {
// Link the reduction chain. Note that loop emitter update the reducValue		// Link the reduction chain. Note that loop emitter update the reducValue
// in place.		// in place.
loopEmitter.exitCurrentLoop(rewriter, loc, reducValue);		loopEmitter.exitCurrentLoop(rewriter, loc, reducValue);
loopEmitter.exitCurrentLoopSeq();		loopEmitter.exitCurrentLoopSeq(rewriter, loc);
}		}

// Replace the foreach operator with the value returned by the outtermost		// Replace the foreach operator with the value returned by the outtermost
// for loop.		// for loop.
rewriter.replaceOp(op, reducValue);		rewriter.replaceOp(op, reducValue);
return success();		return success();
}		}
};		};
▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp

	Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	};			};

	/// A helper class that visits an affine expression and tries to find an			/// A helper class that visits an affine expression and tries to find an
	/// AffineDimExpr to which the corresponding iterator from a GenericOp matches			/// AffineDimExpr to which the corresponding iterator from a GenericOp matches
	/// the desired iterator type.			/// the desired iterator type.
	class AffineDimFinder : public AffineExprVisitor<AffineDimFinder> {			class AffineDimFinder : public AffineExprVisitor<AffineDimFinder> {
	public:			public:
	explicit AffineDimFinder(linalg::GenericOp op)			explicit AffineDimFinder(linalg::GenericOp op)
	: iterTypes(op.getIteratorTypesArray()) {}			: iterTypes(op.getIteratorTypes()) {}
	void visitDimExpr(AffineDimExpr expr) {			void visitDimExpr(AffineDimExpr expr) {
				aartbikUnsubmitted Done Reply Inline Actions I know this was already there, but can we use override here to make it more clear that we implementing the base visitor class? Or at the very least group all overrides into a // Visitor method overrides. ... section? aartbik: I know this was already there, but can we use override here to make it more clear that we…
				PeimingAuthorUnsubmitted Done Reply Inline Actions I added a comment, this is non-vritual function, so I did not use override here. Peiming: I added a comment, this is non-vritual function, so I did not use override here.
	if (pickedDim == nullptr \|\| pickIterType == iterTypes[expr.getPosition()]) {			if (pickedDim == nullptr \|\|
				pickIterType == iterTypes[expr.getPosition()]
				.cast<linalg::IteratorTypeAttr>()
				.getValue()) {
	pickedDim = expr;			pickedDim = expr;
	}			}
	}			}

	/// Set the desired iterator type that we want to pick.			/// Set the desired iterator type that we want to pick.
	void setPickedIterType(utils::IteratorType iterType) {			void setPickedIterType(utils::IteratorType iterType) {
	pickIterType = iterType;			pickIterType = iterType;
	}			}

	/// Get the desired AffineDimExpr.			/// Get the desired AffineDimExpr.
	AffineDimExpr getDimExpr() const { return pickedDim.cast<AffineDimExpr>(); }			AffineDimExpr getDimExpr() const { return pickedDim.cast<AffineDimExpr>(); }

	private:			private:
	/// The picked AffineDimExpr after visit.			/// The picked AffineDimExpr after visit.
	AffineExpr pickedDim;			AffineExpr pickedDim;
	/// The iterator type that we want.			/// The iterator type that we want.
	utils::IteratorType pickIterType;			utils::IteratorType pickIterType;
	/// The mapping between dim=>iterator type.			/// The mapping between dim=>iterator type.
	SmallVector<utils::IteratorType> iterTypes;			ArrayAttr iterTypes;
	};			};

	/// A helper class that visits an affine expression and tries to find an			/// A helper class that visits an affine expression and tries to find an
	/// AffineDimExpr to which the corresponding iterator from a GenericOp matches			/// AffineDimExpr to which the corresponding iterator from a GenericOp matches
	/// the desired iterator type.			/// the desired iterator type.
	struct AffineDimCollector : public AffineExprVisitor<AffineDimFinder> {			struct AffineDimCollector : public AffineExprVisitor<AffineDimCollector> {
	void visitDimExpr(AffineDimExpr expr) { dims.push_back(expr); }			void visitDimExpr(AffineDimExpr expr) { dims.push_back(expr); }
	SmallVector<AffineDimExpr> dims;			SmallVector<AffineDimExpr> dims;
	};			};

	} // namespace			} // namespace

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Sparse compiler analysis methods.			// Sparse compiler analysis methods.
	▲ Show 20 Lines • Show All 1,428 Lines • ▼ Show 20 Lines
	/// Ends a single loop in current sequence. Returns new values for needsUniv.			/// Ends a single loop in current sequence. Returns new values for needsUniv.
	static bool endLoop(CodegenEnv &env, RewriterBase &rewriter, Operation *loop,			static bool endLoop(CodegenEnv &env, RewriterBase &rewriter, Operation *loop,
	unsigned idx, unsigned li, bool needsUniv) {			unsigned idx, unsigned li, bool needsUniv) {
	// End a while-loop.			// End a while-loop.
	if (auto whileOp = dyn_cast<scf::WhileOp>(loop)) {			if (auto whileOp = dyn_cast<scf::WhileOp>(loop)) {
	finalizeWhileOp(env, rewriter, idx, needsUniv, env.lat(li).bits, whileOp);			finalizeWhileOp(env, rewriter, idx, needsUniv, env.lat(li).bits, whileOp);
	} else {			} else {
	needsUniv = false;			needsUniv = false;
	}			}
				aartbikUnsubmitted Done Reply Inline Actions I would start with the same comment as in the else, and state it in the affirmative rather than the speculative // End either a for-loop or a while-loop that iterates over a slice. aartbik: I would start with the same comment as in the else, and state it in the affirmative rather than…

	env.genLoopBoundary([&](MutableArrayRef<Value> reduc) {			env.genLoopBoundary([&](MutableArrayRef<Value> reduc) {
	env.emitter().exitCurrentLoop(rewriter, env.op().getLoc(), reduc);			env.emitter().exitCurrentLoop(rewriter, env.op().getLoc(), reduc);
	return std::nullopt;			return std::nullopt;
	});			});

	return needsUniv;			return needsUniv;
	}			}

	/// Ends a loop sequence at given level.			/// Ends a loop sequence at given level.
	static void endLoopSeq(CodegenEnv &env, OpBuilder &builder, unsigned exp,			static void endLoopSeq(CodegenEnv &env, OpBuilder &builder, unsigned exp,
	unsigned at, unsigned idx, unsigned ldx) {			unsigned at, unsigned idx, unsigned ldx) {
	assert(env.getLoopIdxValue(idx) == nullptr);			assert(env.getLoopIdxValue(idx) == nullptr);
	env.emitter().exitCurrentLoopSeq();			env.emitter().exitCurrentLoopSeq(builder, env.op().getLoc());
	// Unmark bookkeeping of invariants and loop index.			// Unmark bookkeeping of invariants and loop index.
	genInvariants(env, builder, exp, ldx, /atStart=/false);			genInvariants(env, builder, exp, ldx, /atStart=/false);
	// Finalize access pattern expansion for sparse tensor output.			// Finalize access pattern expansion for sparse tensor output.
	genExpand(env, builder, at, /atStart=/false);			genExpand(env, builder, at, /atStart=/false);
	}			}

	/// Recursively generates code while computing iteration lattices in order			/// Recursively generates code while computing iteration lattices in order
	/// to manage the complexity of implementing co-iteration over unions			/// to manage the complexity of implementing co-iteration over unions
	▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines

mlir/test/Dialect/SparseTensor/sparse_conv_2d_slice_based.mlir

This file was added.

				// RUN: mlir-opt %s --sparsification="enable-slice-affine=true" --cse \| FileCheck %s

				#map = affine_map<(d0, d1, d2, d3) -> (d0 + d2, d1 + d3)>
				#map1 = affine_map<(d0, d1, d2, d3) -> (d2, d3)>
				#map2 = affine_map<(d0, d1, d2, d3) -> (d0, d1)>

				#DCSR = #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>
				// CHECK-LABEL: func.func @conv2d_all_sparse_CSR(
				// CHECK-SAME: %[[VAL_0:.*]]: tensor<8x8xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>,
				// CHECK-SAME: %[[VAL_1:.*]]: tensor<3x3xi32>) -> tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>> {
				// CHECK: %[[VAL_2:.*]] = arith.constant 8 : index
				// CHECK: %[[VAL_3:.*]] = arith.constant 3 : index
				// CHECK: %[[VAL_4:.*]] = arith.constant 4 : index
				// CHECK: %[[VAL_5:.*]] = arith.constant 0 : index
				// CHECK: %[[VAL_6:.*]] = arith.constant 1 : index
				// CHECK: %[[VAL_7:.*]] = arith.constant 2 : index
				// CHECK: %[[VAL_8:.*]] = arith.constant 0 : i32
				// CHECK: %[[VAL_9:.*]] = arith.constant true
				// CHECK: %[[VAL_10:.*]] = arith.constant false
				// CHECK: %[[VAL_11:.*]] = bufferization.alloc_tensor() : tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: %[[VAL_12:.*]] = sparse_tensor.pointers %[[VAL_0]] {dimension = 0 : index}
				// CHECK: %[[VAL_13:.*]] = sparse_tensor.indices %[[VAL_0]] {dimension = 0 : index}
				// CHECK: %[[VAL_14:.*]] = sparse_tensor.pointers %[[VAL_0]] {dimension = 1 : index}
				// CHECK: %[[VAL_15:.*]] = sparse_tensor.indices %[[VAL_0]] {dimension = 1 : index}
				// CHECK: %[[VAL_16:.*]] = sparse_tensor.values %[[VAL_0]]
				// CHECK: %[[VAL_17:.*]] = bufferization.to_memref %[[VAL_1]] : memref<3x3xi32>
				// CHECK: %[[VAL_18:.*]] = memref.alloca(%[[VAL_4]]) : memref<?xindex>
				// CHECK: %[[VAL_19:.*]] = memref.alloca(%[[VAL_2]]) : memref<?xindex>
				// CHECK: %[[VAL_20:.*]] = memref.load %[[VAL_12]]{{\[}}%[[VAL_6]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_4]], %[[VAL_18]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_5]], %[[VAL_18]]{{\[}}%[[VAL_6]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_5]], %[[VAL_18]]{{\[}}%[[VAL_7]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_20]], %[[VAL_18]]{{\[}}%[[VAL_3]]] : memref<?xindex>
				// CHECK: %[[VAL_21:.*]] = arith.cmpi ugt, %[[VAL_20]], %[[VAL_5]] : index
				// CHECK: %[[VAL_22:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: %[[VAL_23:.*]] = arith.cmpi uge, %[[VAL_22]], %[[VAL_3]] : index
				// CHECK: %[[VAL_24:.*]] = arith.andi %[[VAL_21]], %[[VAL_23]] : i1
				// CHECK: %[[VAL_25:.*]] = arith.addi %[[VAL_22]], %[[VAL_6]] : index
				// CHECK: %[[VAL_26:.*]] = arith.subi %[[VAL_25]], %[[VAL_3]] : index
				// CHECK: %[[VAL_27:.*]] = arith.select %[[VAL_24]], %[[VAL_26]], %[[VAL_5]] : index
				// CHECK: %[[VAL_28:.*]] = tensor.extract_slice %[[VAL_0]]{{\[}}%[[VAL_27]], 0] {{\[}}%[[VAL_3]], 8] [1, 1]
				// CHECK: %[[VAL_29:.]]:5 = scf.while (%[[VAL_30:.]] = %[[VAL_21]],
				// CHECK-SAME: %[[VAL_31:.*]] = %[[VAL_28]],
				// CHECK-SAME: %[[VAL_32:.*]] = %[[VAL_22]],
				// CHECK-SAME: %[[VAL_33:.*]] = %[[VAL_27]],
				// CHECK-SAME: %[[VAL_34:.*]] = %[[VAL_11]])
				// CHECK: scf.condition(%[[VAL_30]]) %[[VAL_30]], %[[VAL_31]], %[[VAL_32]], %[[VAL_33]], %[[VAL_34]]
				// CHECK: } do {
				// CHECK: ^bb0(%[[VAL_35:.*]]: i1,
				// CHECK-SAME: %[[VAL_36:.*]]: tensor<?x8xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], slice = [ (?, ?, 1), (0, 8, 1) ] }>>,
				// CHECK-SAME: %[[VAL_37:.]]: index, %[[VAL_38:.]]: index,
				// CHECK-SAME: %[[VAL_39:.*]]: tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>):
				// CHECK: %[[VAL_40:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				//
				// !!!!! Code below is for slice non empty begin.
				//
				// CHECK: %[[VAL_41:.]]:3 = scf.for %[[VAL_42:.]] = %[[VAL_7]] to %[[VAL_40]] step %[[VAL_7]] iter_args(%[[VAL_43:.]] = %[[VAL_10]], %[[VAL_44:.]] = %[[VAL_2]], %[[VAL_45:.*]] = %[[VAL_7]]) -> (i1, index, index) {
				// CHECK: %[[VAL_46:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_42]]] : memref<?xindex>
				// CHECK: %[[VAL_47:.*]] = arith.addi %[[VAL_42]], %[[VAL_6]] : index
				// CHECK: %[[VAL_48:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_47]]] : memref<?xindex>
				// CHECK: %[[VAL_49:.*]] = arith.addi %[[VAL_38]], %[[VAL_3]] : index
				// CHECK: %[[VAL_50:.]]:5 = scf.while (%[[VAL_51:.]] = %[[VAL_46]], %[[VAL_52:.]] = %[[VAL_9]], %[[VAL_53:.]] = %[[VAL_43]], %[[VAL_54:.]] = %[[VAL_44]], %[[VAL_55:.]] = %[[VAL_45]]) : (index, i1, i1, index, index) -> (index, i1, i1, index, index) {
				// CHECK: %[[VAL_56:.*]] = arith.cmpi ult, %[[VAL_51]], %[[VAL_48]] : index
				// CHECK: %[[VAL_57:.*]] = arith.andi %[[VAL_52]], %[[VAL_56]] : i1
				// CHECK: scf.condition(%[[VAL_57]]) %[[VAL_51]], %[[VAL_52]], %[[VAL_53]], %[[VAL_54]], %[[VAL_55]] : index, i1, i1, index, index
				// CHECK: } do {
				// CHECK: ^bb0(%[[VAL_58:.]]: index, %[[VAL_59:.]]: i1, %[[VAL_60:.]]: i1, %[[VAL_61:.]]: index, %[[VAL_62:.*]]: index):
				// CHECK: %[[VAL_63:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_58]]] : memref<?xindex>
				// CHECK: %[[VAL_64:.*]] = arith.cmpi ult, %[[VAL_63]], %[[VAL_49]] : index
				// CHECK: %[[VAL_65:.*]]:3 = scf.if %[[VAL_64]] -> (i1, index, index) {
				// CHECK: %[[VAL_66:.*]] = arith.addi %[[VAL_58]], %[[VAL_6]] : index
				// CHECK: %[[VAL_67:.*]] = memref.load %[[VAL_14]]{{\[}}%[[VAL_58]]] : memref<?xindex>
				// CHECK: %[[VAL_68:.*]] = memref.load %[[VAL_14]]{{\[}}%[[VAL_66]]] : memref<?xindex>
				// CHECK: %[[VAL_69:.*]] = arith.cmpi ult, %[[VAL_67]], %[[VAL_68]] : index
				// CHECK: %[[VAL_70:.*]] = arith.ori %[[VAL_69]], %[[VAL_60]] : i1
				// CHECK: %[[VAL_71:.*]] = scf.if %[[VAL_69]] -> (index) {
				// CHECK: %[[VAL_72:.*]] = memref.load %[[VAL_15]]{{\[}}%[[VAL_67]]] : memref<?xindex>
				// CHECK: %[[VAL_73:.*]] = arith.cmpi ult, %[[VAL_72]], %[[VAL_61]] : index
				// CHECK: %[[VAL_74:.*]] = arith.select %[[VAL_73]], %[[VAL_72]], %[[VAL_61]] : index
				// CHECK: scf.yield %[[VAL_74]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_61]] : index
				// CHECK: }
				// CHECK: memref.store %[[VAL_67]], %[[VAL_19]]{{\[}}%[[VAL_62]]] : memref<?xindex>
				// CHECK: %[[VAL_75:.*]] = arith.addi %[[VAL_62]], %[[VAL_6]] : index
				// CHECK: memref.store %[[VAL_68]], %[[VAL_19]]{{\[}}%[[VAL_75]]] : memref<?xindex>
				// CHECK: %[[VAL_76:.*]] = arith.addi %[[VAL_62]], %[[VAL_7]] : index
				// CHECK: scf.yield %[[VAL_70]], %[[VAL_77:.*]], %[[VAL_76]] : i1, index, index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_60]], %[[VAL_61]], %[[VAL_62]] : i1, index, index
				// CHECK: } {"Emitted from" = "slice"}
				// CHECK: %[[VAL_78:.*]] = arith.addi %[[VAL_58]], %[[VAL_6]] : index
				// CHECK: scf.yield %[[VAL_78]], %[[VAL_64]], %[[VAL_79:.*]]#0, %[[VAL_79]]#1, %[[VAL_79]]#2 : index, i1, i1, index, index
				// CHECK: }
				// CHECK: scf.yield %[[VAL_80:.*]]#2, %[[VAL_80]]#3, %[[VAL_80]]#4 : i1, index, index
				// CHECK: }
				// CHECK: memref.store %[[VAL_81:.*]]#2, %[[VAL_19]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_5]], %[[VAL_19]]{{\[}}%[[VAL_6]]] : memref<?xindex>
				// CHECK: %[[VAL_82:.*]] = arith.cmpi uge, %[[VAL_81]]#1, %[[VAL_3]] : index
				// CHECK: %[[VAL_83:.*]] = arith.andi %[[VAL_81]]#0, %[[VAL_82]] : i1
				// CHECK: %[[VAL_84:.*]] = arith.addi %[[VAL_81]]#1, %[[VAL_6]] : index
				// CHECK: %[[VAL_85:.*]] = arith.subi %[[VAL_84]], %[[VAL_3]] : index
				// CHECK: %[[VAL_86:.*]] = arith.select %[[VAL_83]], %[[VAL_85]], %[[VAL_5]] : index
				// CHECK: %[[VAL_87:.*]] = tensor.extract_slice %[[VAL_36]][0, %[[VAL_86]]] {{\[}}%[[VAL_2]], %[[VAL_3]]] [1, 1] : tensor<?x8xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], slice = [ (?, ?, 1), (0, 8, 1) ] }>> to tensor<?x?xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], slice = [ (0, ?, 1), (?, ?, 1) ] }>>
				// CHECK: %[[VAL_88:.]]:5 = scf.while (%[[VAL_89:.]] = %[[VAL_81]]#0, %[[VAL_90:.]] = %[[VAL_87]], %[[VAL_91:.]] = %[[VAL_81]]#1, %[[VAL_92:.]] = %[[VAL_86]], %[[VAL_93:.]] = %[[VAL_39]]) : (i1, tensor<?x?xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], slice = [ (0, ?, 1), (?, ?, 1) ] }>>, index, index, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>) -> (i1, tensor<?x?xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], slice = [ (0, ?, 1), (?, ?, 1) ] }>>, index, index, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>) {
				// CHECK: scf.condition(%[[VAL_89]]) %[[VAL_89]], %[[VAL_90]], %[[VAL_91]], %[[VAL_92]], %[[VAL_93]] : i1, tensor<?x?xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], slice = [ (0, ?, 1), (?, ?, 1) ] }>>, index, index, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: } do {
				// CHECK: ^bb0(%[[VAL_94:.]]: i1, %[[VAL_95:.]]: tensor<?x?xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], slice = [ (0, ?, 1), (?, ?, 1) ] }>>, %[[VAL_96:.]]: index, %[[VAL_97:.]]: index, %[[VAL_98:.*]]: tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>):
				// CHECK: %[[VAL_99:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_6]]] : memref<?xindex>
				// CHECK: %[[VAL_100:.*]] = arith.addi %[[VAL_99]], %[[VAL_7]] : index
				// CHECK: %[[VAL_101:.*]] = arith.addi %[[VAL_100]], %[[VAL_6]] : index
				// CHECK: %[[VAL_102:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_100]]] : memref<?xindex>
				// CHECK: %[[VAL_103:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_101]]] : memref<?xindex>
				// CHECK: %[[VAL_104:.*]] = arith.addi %[[VAL_38]], %[[VAL_3]] : index
				// CHECK: %[[VAL_105:.]]:4 = scf.while (%[[VAL_106:.]] = %[[VAL_102]], %[[VAL_107:.]] = %[[VAL_9]], %[[VAL_108:.]] = %[[VAL_8]], %[[VAL_109:.*]] = %[[VAL_98]]) : (index, i1, i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>) -> (index, i1, i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>) {
				// CHECK: %[[VAL_110:.*]] = arith.cmpi ult, %[[VAL_106]], %[[VAL_103]] : index
				// CHECK: %[[VAL_111:.*]] = arith.andi %[[VAL_107]], %[[VAL_110]] : i1
				// CHECK: scf.condition(%[[VAL_111]]) %[[VAL_106]], %[[VAL_107]], %[[VAL_108]], %[[VAL_109]] : index, i1, i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: } do {
				// CHECK: ^bb0(%[[VAL_112:.]]: index, %[[VAL_113:.]]: i1, %[[VAL_114:.]]: i32, %[[VAL_115:.]]: tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>):
				// CHECK: %[[VAL_116:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_112]]] : memref<?xindex>
				// CHECK: %[[VAL_117:.*]] = arith.cmpi ult, %[[VAL_116]], %[[VAL_104]] : index
				// CHECK: %[[VAL_118:.*]]:2 = scf.if %[[VAL_117]] -> (i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>) {
				// CHECK: %[[VAL_119:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_112]]] : memref<?xindex>
				// CHECK: %[[VAL_120:.*]] = arith.subi %[[VAL_119]], %[[VAL_38]] : index
				// CHECK: %[[VAL_121:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_6]]] : memref<?xindex>
				// CHECK: %[[VAL_122:.*]] = arith.addi %[[VAL_121]], %[[VAL_7]] : index
				// CHECK: %[[VAL_123:.*]] = arith.addi %[[VAL_122]], %[[VAL_6]] : index
				// CHECK: %[[VAL_124:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_122]]] : memref<?xindex>
				// CHECK: %[[VAL_125:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_123]]] : memref<?xindex>
				// CHECK: %[[VAL_126:.*]] = arith.addi %[[VAL_97]], %[[VAL_3]] : index
				// CHECK: %[[VAL_127:.]]:4 = scf.while (%[[VAL_128:.]] = %[[VAL_124]], %[[VAL_129:.]] = %[[VAL_9]], %[[VAL_130:.]] = %[[VAL_114]], %[[VAL_131:.*]] = %[[VAL_115]]) : (index, i1, i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>) -> (index, i1, i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>) {
				// CHECK: %[[VAL_132:.*]] = arith.cmpi ult, %[[VAL_128]], %[[VAL_125]] : index
				// CHECK: %[[VAL_133:.*]] = arith.andi %[[VAL_129]], %[[VAL_132]] : i1
				// CHECK: scf.condition(%[[VAL_133]]) %[[VAL_128]], %[[VAL_129]], %[[VAL_130]], %[[VAL_131]] : index, i1, i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: } do {
				//
				// !!!!! Code below is the actually convolution kernel
				//
				// CHECK: ^bb0(%[[VAL_134:.]]: index, %[[VAL_135:.]]: i1, %[[VAL_136:.]]: i32, %[[VAL_137:.]]: tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>):
				// CHECK: %[[VAL_138:.*]] = memref.load %[[VAL_15]]{{\[}}%[[VAL_134]]] : memref<?xindex>
				// CHECK: %[[VAL_139:.*]] = arith.cmpi ult, %[[VAL_138]], %[[VAL_126]] : index
				// CHECK: %[[VAL_140:.*]]:2 = scf.if %[[VAL_139]] -> (i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>) {
				// CHECK: %[[VAL_141:.*]] = memref.load %[[VAL_15]]{{\[}}%[[VAL_134]]] : memref<?xindex>
				// CHECK: %[[VAL_142:.*]] = arith.subi %[[VAL_141]], %[[VAL_97]] : index
				// CHECK: %[[VAL_143:.*]] = memref.load %[[VAL_16]]{{\[}}%[[VAL_134]]] : memref<?xi32>
				// CHECK: %[[VAL_144:.*]] = memref.load %[[VAL_17]]{{\[}}%[[VAL_120]], %[[VAL_142]]] : memref<3x3xi32>
				// CHECK: %[[VAL_145:.*]] = arith.muli %[[VAL_143]], %[[VAL_144]] : i32
				// CHECK: %[[VAL_146:.*]] = arith.addi %[[VAL_136]], %[[VAL_145]] : i32
				// CHECK: scf.yield %[[VAL_146]], %[[VAL_137]] : i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_136]], %[[VAL_137]] : i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: } {"Emitted from" = "slice"}
				// CHECK: %[[VAL_147:.*]] = arith.addi %[[VAL_134]], %[[VAL_6]] : index
				// CHECK: scf.yield %[[VAL_147]], %[[VAL_139]], %[[VAL_148:.*]]#0, %[[VAL_148]]#1 : index, i1, i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: } attributes {"Emitted from" = "linalg.generic"}
				// CHECK: %[[VAL_149:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_6]]] : memref<?xindex>
				// CHECK: %[[VAL_150:.*]] = arith.addi %[[VAL_149]], %[[VAL_7]] : index
				// CHECK: memref.store %[[VAL_150]], %[[VAL_19]]{{\[}}%[[VAL_6]]] : memref<?xindex>
				// CHECK: scf.yield %[[VAL_151:.*]]#2, %[[VAL_151]]#3 : i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_114]], %[[VAL_115]] : i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: } {"Emitted from" = "slice"}
				// CHECK: %[[VAL_152:.*]] = arith.addi %[[VAL_112]], %[[VAL_6]] : index
				// CHECK: scf.yield %[[VAL_152]], %[[VAL_117]], %[[VAL_153:.*]]#0, %[[VAL_153]]#1 : index, i1, i32, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: } attributes {"Emitted from" = "linalg.generic"}
				// CHECK: %[[VAL_154:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_6]]] : memref<?xindex>
				// CHECK: %[[VAL_155:.*]] = arith.addi %[[VAL_154]], %[[VAL_7]] : index
				// CHECK: memref.store %[[VAL_155]], %[[VAL_18]]{{\[}}%[[VAL_6]]] : memref<?xindex>
				// CHECK: %[[VAL_156:.]] = sparse_tensor.insert %[[VAL_157:.]]#2 into %[[VAL_157]]#3{{\[}}%[[VAL_38]], %[[VAL_97]]] : tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: memref.store %[[VAL_5]], %[[VAL_18]]{{\[}}%[[VAL_6]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_5]], %[[VAL_19]]{{\[}}%[[VAL_6]]] : memref<?xindex>
				// CHECK: %[[VAL_158:.*]] = arith.cmpi ugt, %[[VAL_96]], %[[VAL_97]] : index
				// CHECK: %[[VAL_159:.*]]:3 = scf.if %[[VAL_158]] -> (index, i1, index) {
				// CHECK: %[[VAL_160:.*]] = arith.addi %[[VAL_97]], %[[VAL_6]] : index
				// CHECK: scf.yield %[[VAL_96]], %[[VAL_94]], %[[VAL_160]] : index, i1, index
				// CHECK: } else {
				// CHECK: %[[VAL_161:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: %[[VAL_162:.]]:2 = scf.for %[[VAL_163:.]] = %[[VAL_7]] to %[[VAL_161]] step %[[VAL_7]] iter_args(%[[VAL_164:.]] = %[[VAL_2]], %[[VAL_165:.]] = %[[VAL_10]]) -> (index, i1) {
				// CHECK: %[[VAL_166:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_163]]] : memref<?xindex>
				// CHECK: %[[VAL_167:.*]] = arith.addi %[[VAL_163]], %[[VAL_6]] : index
				// CHECK: %[[VAL_168:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_167]]] : memref<?xindex>
				// CHECK: %[[VAL_169:.*]] = arith.cmpi ult, %[[VAL_166]], %[[VAL_168]] : index
				// CHECK: %[[VAL_170:.*]] = scf.if %[[VAL_169]] -> (index) {
				// CHECK: %[[VAL_171:.*]] = memref.load %[[VAL_15]]{{\[}}%[[VAL_166]]] : memref<?xindex>
				// CHECK: %[[VAL_172:.*]] = arith.cmpi eq, %[[VAL_171]], %[[VAL_96]] : index
				// CHECK: %[[VAL_173:.*]] = scf.if %[[VAL_172]] -> (index) {
				// CHECK: %[[VAL_174:.*]] = arith.addi %[[VAL_166]], %[[VAL_6]] : index
				// CHECK: memref.store %[[VAL_174]], %[[VAL_19]]{{\[}}%[[VAL_163]]] : memref<?xindex>
				// CHECK: scf.yield %[[VAL_174]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_166]] : index
				// CHECK: }
				// CHECK: scf.yield %[[VAL_175:.*]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_166]] : index
				// CHECK: }
				// CHECK: %[[VAL_176:.]] = arith.cmpi ult, %[[VAL_177:.]], %[[VAL_168]] : index
				// CHECK: %[[VAL_178:.*]] = scf.if %[[VAL_176]] -> (index) {
				// CHECK: %[[VAL_179:.*]] = memref.load %[[VAL_15]]{{\[}}%[[VAL_177]]] : memref<?xindex>
				// CHECK: scf.yield %[[VAL_179]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_164]] : index
				// CHECK: }
				// CHECK: %[[VAL_180:.*]] = arith.ori %[[VAL_176]], %[[VAL_165]] : i1
				// CHECK: %[[VAL_181:.]] = arith.cmpi ult, %[[VAL_182:.]], %[[VAL_164]] : index
				// CHECK: %[[VAL_183:.*]] = arith.select %[[VAL_181]], %[[VAL_182]], %[[VAL_164]] : index
				// CHECK: scf.yield %[[VAL_183]], %[[VAL_180]] : index, i1
				// CHECK: }
				// CHECK: %[[VAL_184:.]] = arith.addi %[[VAL_185:.]]#0, %[[VAL_6]] : index
				// CHECK: %[[VAL_186:.*]] = arith.subi %[[VAL_184]], %[[VAL_3]] : index
				// CHECK: %[[VAL_187:.*]] = arith.cmpi uge, %[[VAL_184]], %[[VAL_3]] : index
				// CHECK: %[[VAL_188:.*]] = arith.select %[[VAL_187]], %[[VAL_186]], %[[VAL_5]] : index
				// CHECK: scf.yield %[[VAL_185]]#0, %[[VAL_185]]#1, %[[VAL_188]] : index, i1, index
				// CHECK: }
				//
				// !!!!! Code below is for slice non empty next.
				//
				// CHECK: %[[VAL_189:.*]] = arith.addi %[[VAL_97]], %[[VAL_6]] : index
				// CHECK: %[[VAL_190:.]] = arith.cmpi ugt, %[[VAL_191:.]]#2, %[[VAL_189]] : index
				// CHECK: %[[VAL_192:.*]] = arith.select %[[VAL_190]], %[[VAL_191]]#2, %[[VAL_189]] : index
				// CHECK: %[[VAL_193:.*]] = arith.addi %[[VAL_192]], %[[VAL_3]] : index
				// CHECK: %[[VAL_194:.*]] = arith.cmpi ule, %[[VAL_193]], %[[VAL_2]] : index
				// CHECK: %[[VAL_195:.*]] = arith.andi %[[VAL_191]]#1, %[[VAL_194]] : i1
				// CHECK: %[[VAL_196:.*]] = arith.select %[[VAL_195]], %[[VAL_192]], %[[VAL_5]] : index
				// CHECK: %[[VAL_197:.*]] = tensor.extract_slice %[[VAL_36]][0, %[[VAL_196]]] {{\[}}%[[VAL_2]], %[[VAL_3]]] [1, 1] : tensor<?x8xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], slice = [ (?, ?, 1), (0, 8, 1) ] }>> to tensor<?x?xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], slice = [ (0, ?, 1), (?, ?, 1) ] }>>
				// CHECK: scf.yield %[[VAL_195]], %[[VAL_197]], %[[VAL_191]]#0, %[[VAL_192]], %[[VAL_156]] : i1, tensor<?x?xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], slice = [ (0, ?, 1), (?, ?, 1) ] }>>, index, index, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: } attributes {"Emitted from" = "linalg.generic"}
				// CHECK: memref.store %[[VAL_5]], %[[VAL_18]]{{\[}}%[[VAL_6]]] : memref<?xindex>
				// CHECK: %[[VAL_198:.*]] = arith.cmpi ugt, %[[VAL_37]], %[[VAL_38]] : index
				// CHECK: %[[VAL_199:.*]]:3 = scf.if %[[VAL_198]] -> (index, i1, index) {
				// CHECK: %[[VAL_200:.*]] = arith.addi %[[VAL_38]], %[[VAL_6]] : index
				// CHECK: scf.yield %[[VAL_37]], %[[VAL_35]], %[[VAL_200]] : index, i1, index
				// CHECK: } else {
				// CHECK: %[[VAL_201:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: %[[VAL_202:.]]:2 = scf.for %[[VAL_203:.]] = %[[VAL_7]] to %[[VAL_201]] step %[[VAL_7]] iter_args(%[[VAL_204:.]] = %[[VAL_2]], %[[VAL_205:.]] = %[[VAL_10]]) -> (index, i1) {
				// CHECK: %[[VAL_206:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_203]]] : memref<?xindex>
				// CHECK: %[[VAL_207:.*]] = arith.addi %[[VAL_203]], %[[VAL_6]] : index
				// CHECK: %[[VAL_208:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_207]]] : memref<?xindex>
				// CHECK: %[[VAL_209:.*]] = arith.cmpi ult, %[[VAL_206]], %[[VAL_208]] : index
				// CHECK: %[[VAL_210:.*]] = scf.if %[[VAL_209]] -> (index) {
				// CHECK: %[[VAL_211:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_206]]] : memref<?xindex>
				// CHECK: %[[VAL_212:.*]] = arith.cmpi eq, %[[VAL_211]], %[[VAL_37]] : index
				// CHECK: %[[VAL_213:.*]] = scf.if %[[VAL_212]] -> (index) {
				// CHECK: %[[VAL_214:.*]] = arith.addi %[[VAL_206]], %[[VAL_6]] : index
				// CHECK: memref.store %[[VAL_214]], %[[VAL_18]]{{\[}}%[[VAL_203]]] : memref<?xindex>
				// CHECK: scf.yield %[[VAL_214]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_206]] : index
				// CHECK: }
				// CHECK: scf.yield %[[VAL_215:.*]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_206]] : index
				// CHECK: }
				// CHECK: %[[VAL_216:.]] = arith.cmpi ult, %[[VAL_217:.]], %[[VAL_208]] : index
				// CHECK: %[[VAL_218:.*]] = scf.if %[[VAL_216]] -> (index) {
				// CHECK: %[[VAL_219:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_217]]] : memref<?xindex>
				// CHECK: scf.yield %[[VAL_219]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_204]] : index
				// CHECK: }
				// CHECK: %[[VAL_220:.*]] = arith.ori %[[VAL_216]], %[[VAL_205]] : i1
				// CHECK: %[[VAL_221:.]] = arith.cmpi ult, %[[VAL_222:.]], %[[VAL_204]] : index
				// CHECK: %[[VAL_223:.*]] = arith.select %[[VAL_221]], %[[VAL_222]], %[[VAL_204]] : index
				// CHECK: scf.yield %[[VAL_223]], %[[VAL_220]] : index, i1
				// CHECK: }
				// CHECK: %[[VAL_224:.]] = arith.addi %[[VAL_225:.]]#0, %[[VAL_6]] : index
				// CHECK: %[[VAL_226:.*]] = arith.subi %[[VAL_224]], %[[VAL_3]] : index
				// CHECK: %[[VAL_227:.*]] = arith.cmpi uge, %[[VAL_224]], %[[VAL_3]] : index
				// CHECK: %[[VAL_228:.*]] = arith.select %[[VAL_227]], %[[VAL_226]], %[[VAL_5]] : index
				// CHECK: scf.yield %[[VAL_225]]#0, %[[VAL_225]]#1, %[[VAL_228]] : index, i1, index
				// CHECK: }
				// CHECK: %[[VAL_229:.*]] = arith.addi %[[VAL_38]], %[[VAL_6]] : index
				// CHECK: %[[VAL_230:.]] = arith.cmpi ugt, %[[VAL_231:.]]#2, %[[VAL_229]] : index
				// CHECK: %[[VAL_232:.*]] = arith.select %[[VAL_230]], %[[VAL_231]]#2, %[[VAL_229]] : index
				// CHECK: %[[VAL_233:.*]] = arith.addi %[[VAL_232]], %[[VAL_3]] : index
				// CHECK: %[[VAL_234:.*]] = arith.cmpi ule, %[[VAL_233]], %[[VAL_2]] : index
				// CHECK: %[[VAL_235:.*]] = arith.andi %[[VAL_231]]#1, %[[VAL_234]] : i1
				// CHECK: %[[VAL_236:.*]] = arith.select %[[VAL_235]], %[[VAL_232]], %[[VAL_5]] : index
				// CHECK: %[[VAL_237:.*]] = tensor.extract_slice %[[VAL_36]]{{\[}}%[[VAL_236]], 0] {{\[}}%[[VAL_3]], 8] [1, 1] : tensor<?x8xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], slice = [ (?, ?, 1), (0, 8, 1) ] }>> to tensor<?x8xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], slice = [ (?, ?, 1), (0, 8, 1) ] }>>
				// CHECK: scf.yield %[[VAL_235]], %[[VAL_237]], %[[VAL_231]]#0, %[[VAL_232]], %[[VAL_238:.*]]#4 : i1, tensor<?x8xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], slice = [ (?, ?, 1), (0, 8, 1) ] }>>, index, index, tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: } attributes {"Emitted from" = "linalg.generic"}
				// CHECK: %[[VAL_239:.]] = sparse_tensor.load %[[VAL_240:.]]#4 hasInserts : tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: return %[[VAL_239]] : tensor<6x6xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>>
				// CHECK: }
				func.func @conv2d_all_sparse_CSR(%arg0: tensor<8x8xi32, #DCSR>,
				%arg1: tensor<3x3xi32>) -> tensor<6x6xi32, #DCSR> {
				%0 = bufferization.alloc_tensor() : tensor<6x6xi32, #DCSR>
				%1 = linalg.generic {
				indexing_maps = [#map, #map1, #map2],
				iterator_types = ["parallel", "parallel", "reduction", "reduction"]}
				ins(%arg0, %arg1 : tensor<8x8xi32, #DCSR>, tensor<3x3xi32>)
				outs(%0 : tensor<6x6xi32, #DCSR>) {
				^bb0(%in: i32, %in_0: i32, %out: i32):
				%2 = arith.muli %in, %in_0 : i32
				%3 = arith.addi %out, %2 : i32
				linalg.yield %3 : i32
				} -> tensor<6x6xi32, #DCSR>
				return %1 : tensor<6x6xi32, #DCSR>
				}

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_2d_slice_based.mlir

This file was added.

				// DEFINE: %{option} = "enable-slice-affine=true enable-runtime-library=false"
				// DEFINE: %{command} = mlir-opt %s --sparse-compiler=%{option} \| \
				// DEFINE: mlir-cpu-runner \
				// DEFINE: -e entry -entry-point-result=void \
				// DEFINE: -shared-libs=%mlir_lib_dir/libmlir_c_runner_utils%shlibext \| \
				// DEFINE: FileCheck %s
				//
				// RUN: %{command}

				#map = affine_map<(d0, d1, d2, d3) -> (d0 + d2, d1 + d3)>
				#map1 = affine_map<(d0, d1, d2, d3) -> (d2, d3)>
				#map2 = affine_map<(d0, d1, d2, d3) -> (d0, d1)>

				#DCSR = #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>

				module {
				func.func @conv2d_all_sparse_CSR(%arg0: tensor<8x8xi32, #DCSR>, %arg1: tensor<3x3xi32>) -> tensor<6x6xi32, #DCSR> {
				%0 = bufferization.alloc_tensor() : tensor<6x6xi32, #DCSR>
				%1 = linalg.generic {
				indexing_maps = [#map, #map1, #map2],
				iterator_types = ["parallel", "parallel", "reduction", "reduction"]}
				ins(%arg0, %arg1 : tensor<8x8xi32, #DCSR>, tensor<3x3xi32>)
				outs(%0 : tensor<6x6xi32, #DCSR>) {
				^bb0(%in: i32, %in_0: i32, %out: i32):
				%2 = arith.muli %in, %in_0 : i32
				%3 = arith.addi %out, %2 : i32
				linalg.yield %3 : i32
				} -> tensor<6x6xi32, #DCSR>
				return %1 : tensor<6x6xi32, #DCSR>
				}

				func.func @entry() {
				%c0 = arith.constant 0 : index
				%i0 = arith.constant 0 : i32

				// A typical edge detection filter.
				%filter = arith.constant dense<[
				[ 1, 0, -1 ],
				[ 0, 0, 0 ],
				[ -1, 0, 1 ]
				]> : tensor<3x3xi32>

				%input = arith.constant dense<[
				[ 1, 2, 3, 4, 0, 6, 7, 8 ],
				[ 2, 2, 4, 4, 0, 0, 6, 8 ],
				[ 2, 2, 4, 4, 0, 0, 6, 8 ],
				[ 2, 2, 3, 4, 0, 0, 7, 8 ],
				[ 1, 3, 3, 4, 0, 0, 6, 8 ],
				[ 3, 2, 3, 4, 0, 0, 7, 8 ],
				[ 1, 3, 3, 4, 3, 6, 6, 8 ],
				[ 1, 3, 3, 4, 3, 0, 7, 8 ]
				]> : tensor<8x8xi32>

				%sparse_filter_CSR = sparse_tensor.convert %filter
				: tensor<3x3xi32> to tensor<3x3xi32>

				%sparse_input_CSR = sparse_tensor.convert %input
				: tensor<8x8xi32> to tensor<8x8xi32, #DCSR>

				%3 = call @conv2d_all_sparse_CSR(%sparse_input_CSR, %sparse_filter_CSR)
				: (tensor<8x8xi32, #DCSR>,
				tensor<3x3xi32>) -> tensor<6x6xi32, #DCSR>

				%out = sparse_tensor.convert %3
				: tensor<6x6xi32, #DCSR> to tensor<6x6xi32>
				//
				// CHECK: ( ( 0, 0, -1, -6, -1, 6 ),
				// CHECK-SAME: ( -1, 0, 1, 0, 1, 0 ),
				// CHECK-SAME: ( 0, -1, 1, 0, 0, 0 ),
				// CHECK-SAME: ( -1, 0, 0, 0, 0, 0 ),
				// CHECK-SAME: ( 0, 0, 3, 6, -3, -6 ),
				// CHECK-SAME: ( 2, -1, 3, 0, -3, 0 ) )
				//
				%v2 = vector.transfer_read %out[%c0, %c0], %i0
				: tensor<6x6xi32>, vector<6x6xi32>
				vector.print %v2 : vector<6x6xi32>

				return
				}

				}

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_3d_slice_based.mlir

This file was added.

				// DEFINE: %{option} = "enable-slice-affine=true enable-runtime-library=false"
				// DEFINE: %{command} = mlir-opt %s --sparse-compiler=%{option} \| \
				// DEFINE: mlir-cpu-runner \
				// DEFINE: -e entry -entry-point-result=void \
				// DEFINE: -shared-libs=%mlir_lib_dir/libmlir_c_runner_utils%shlibext \| \
				// DEFINE: FileCheck %s
				//
				// RUN: %{command}

				#CCC = #sparse_tensor.encoding<{
				dimLevelType = [ "compressed", "compressed", "compressed" ]
				}>

				func.func @alloc_3d_filled_f32(%s1 : index, %s2 : index, %s3 : index, %f : f32) -> tensor<?x?x?xf32> {
				%buf = bufferization.alloc_tensor(%s1, %s2, %s3) : tensor<?x?x?xf32>
				%ret = linalg.fill ins(%f : f32) outs(%buf : tensor<?x?x?xf32>) -> tensor<?x?x?xf32>
				return %ret : tensor<?x?x?xf32>
				}

				func.func @conv_3d_CCC(%arg0: tensor<?x?x?xf32, #CCC>, %arg1: tensor<?x?x?xf32>) -> tensor<?x?x?xf32, #CCC> {
				%c6 = arith.constant 6 : index
				%s = bufferization.alloc_tensor(%c6, %c6, %c6) : tensor<?x?x?xf32, #CCC>
				%ret = linalg.conv_3d
				ins (%arg0, %arg1: tensor<?x?x?xf32, #CCC>, tensor<?x?x?xf32>)
				outs (%s: tensor<?x?x?xf32, #CCC>) -> tensor<?x?x?xf32, #CCC>
				return %ret : tensor<?x?x?xf32, #CCC>
				}

				func.func @entry() {
				%c0 = arith.constant 0 : index
				%c1 = arith.constant 1 : index
				%c3 = arith.constant 3 : index
				%c6 = arith.constant 6 : index
				%c8 = arith.constant 8 : index
				%f10 = arith.constant 10.00000e+00 : f32
				%val = arith.constant 2.00000e+00 : f32
				%zero = arith.constant 0.00000e+00 : f32

				%filter3D = call @alloc_3d_filled_f32(%c3, %c3, %c3, %val) : (index, index, index, f32) -> (tensor<?x?x?xf32>)
				%in3D_tmp = call @alloc_3d_filled_f32(%c8, %c8, %c8, %val) : (index, index, index, f32) -> (tensor<?x?x?xf32>)
				%in3D = tensor.insert %f10 into %in3D_tmp[%c0, %c3, %c0] : tensor<?x?x?xf32>
				%out3D = call @alloc_3d_filled_f32(%c6, %c6, %c6, %zero) : (index, index, index, f32) -> (tensor<?x?x?xf32>)

				%in3D_CCC = sparse_tensor.convert %in3D
				: tensor<?x?x?xf32> to tensor<?x?x?xf32, #CCC>
				%CCC_ret = call @conv_3d_CCC(%in3D_CCC, %filter3D) : (tensor<?x?x?xf32, #CCC>, tensor<?x?x?xf32>) -> (tensor<?x?x?xf32, #CCC>)
				// CHECK: ( ( ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 124, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 124, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 124, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ) ),
				// CHECK-SAME: ( ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ) ),
				// CHECK-SAME: ( ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ) ),
				// CHECK-SAME: ( ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ) ),
				// CHECK-SAME: ( ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ) ),
				// CHECK-SAME: ( ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ) ) )
				%1 = sparse_tensor.convert %CCC_ret
				: tensor<?x?x?xf32, #CCC> to tensor<?x?x?xf32>
				%v1 = vector.transfer_read %1[%c0, %c0, %c0], %zero
				: tensor<?x?x?xf32>, vector<6x6x6xf32>
				vector.print %v1 : vector<6x6x6xf32>

				// Free the resources
				bufferization.dealloc_tensor %in3D : tensor<?x?x?xf32>
				bufferization.dealloc_tensor %filter3D : tensor<?x?x?xf32>

				bufferization.dealloc_tensor %in3D_CCC : tensor<?x?x?xf32, #CCC>
				bufferization.dealloc_tensor %CCC_ret : tensor<?x?x?xf32, #CCC>

				return
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] extend loop emitter to emit slice driven loopsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 493420

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorRewriting.cpp

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp

mlir/test/Dialect/SparseTensor/sparse_conv_2d_slice_based.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_2d_slice_based.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_3d_slice_based.mlir

[mlir][sparse] extend loop emitter to emit slice driven loops
ClosedPublic