This is an archive of the discontinued LLVM Phabricator instance.

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
412	What do you mean here? I'm guessing this should be "level", but I'm not sure what you mean by "favor(ing) constant levels"
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
138	This should remain "/// Exits"
239–260	Can you make this class "`final`" (ditto for the `LoopInfo`). You don't need/use subclassing here, and marking it final helps the compiler generate more efficient code
243	I'm guessing this should be `LoopOrd`. If not, then what is it?
315	This should be `Level lvl` (I'm pretty sure)
417	Should this be `LoopOrd`? If not, then what is it?
439	What is this supposed to be: `LoopOrd`, `LoopId`, other?

wrengr added inline comments.Mar 22 2023, 2:10 PM

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
37–38	I'd prefer these be defined as `static inline` functions rather than as macros, since that gives better type-safety and compiler error messages. If you need to use CMPI at several different types, then just use a template.
45–47	You should just use `ValueRange::getTypes()`
328–331	Please keep this as `Level l`
441	Please use `TensorId` here. I am just about to upload the CLs that make that into a newtype, so it's important to use the correct type instead of just using `size_t`/`unsigned` everywhere
442	Please define a `dyn_cast` variant of the `getSparseTensorType` function, and use that here and everywhere else. The `SparseTensorType` was created specifically to help avoid several code legibility and correctness concerns, so you should be using it everywhere possible.
446	You should be using `SparseTensorType::getLevelRank` here, since it is specifically the level-rank you want not the dim-rank
447	Please use `Level` for all levels. Even though it's just a typedef for now, I will be converting it to a proper type in the near future, so you should use the correct type rather than just using `unsigned` everywhere
488	I think it'd be clearer to just use `const auto &` here
651	Use "l" or "lvl" here. The name "d" is reserved for things of `Dimension` type, whereas this has `Level` type.
827	Why remove the const? It's clearer to know when local variables will never change
834	Please don't undo my factoring this out into a local variable. The condition is much easier to read when (1) it's all on one line, and (2) avoids repeating common expressions which forces the reader to double check if they are indeed the same or not.
1212	It would be clearer to use `break` in the then-branch. That keeps you from needing to indent the else-branch (which is very long), and helps the reader avoid needing to check to see if there's something else after the else-branch.

Peiming marked an inline comment as done.Mar 22 2023, 2:20 PM

Peiming added inline comments.

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
37–38	I will stick with this. The reason I use macro is that I want to avoid typing `builder` and `loc`.
834	Okay, it is a mistake I made during rebasing.
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
417	This is `unsigned index` to another array (not the loop sequence). I will stick with this.
439	This is an `unsigned` counter.

address comments + fix rebase mistakes.

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
412	I mean to pick the a static known dimension size instead of dynamic ones if there are multiple candidate. (i.e., `DimOp` folds to constant value). It might lead to a slightly better code.
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
827	rebase mistake.
1212	I found `else` is easier to follow, because the control flow is more straight forward.

remove useless comments.

wrengr added inline comments.Mar 22 2023, 3:35 PM

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
43	Please don't undo this variable naming. The "mem" matches other places, and avoids confusion about whether "ptr" means `MemRef` vs the old "pointer" (now "position") vs llvm-pointers vs...
267–272	You should use the `numTensors` variable instead of calling `tensors.size()` repeatedly. (This is for code clarity rather than performance reasons)
334–338	It would be clearer to combine these together. Also, it would be clearer to use `continue` rather than extra indentation for the conditional. Putting those together, maybe use something like `if (depends == 0) continue; assert(!reassoc); sliceSizes[...`
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
138	You want to keep the triple-slash "///", since that's what the tooling uses for generating the API documentation

Harbormaster completed remote builds in B221147: Diff 507519.Mar 22 2023, 3:53 PM

fix rebase mistakes.

Harbormaster completed remote builds in B221334: Diff 507769.Mar 23 2023, 10:02 AM

fix some TODOs.

Harbormaster completed remote builds in B221355: Diff 507799.Mar 23 2023, 11:26 AM

wrengr added inline comments.Mar 24 2023, 6:15 PM

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
455	That's the wrong bound to assert; you should instead compare against `levelToDependentIdx[t].size()` or `maxLvlRank`. For correctness you should also `assert(i < numLoops && t < numTensors)`. If you rebase over D146684 to get the assertion helpers, then the full assertion would be `assert(isValidLevel(t, lvl) && isValidLoopId(i) && !loopToDependencies[i][t].has_value())`. The nice thing about those assertion helpers is that it gives the correct bound for the level (rather than falling back to `maxLvlRank`), and also guards against the case where `i == kInvalidId`.
456	Isn't it redundant to store the level-type in `loopToDependencies`, since it's already stored in `lvlTypes`? That is, afaict the following code snippet should always return successfully (assuming `i` and `t` are valid): if (const auto dep = loopToDependencies[i][t]) { const Level depLvl = (dep).first; const auto depOptLoop = lvlToLoop[t][depLvl]; assert(depOptLoop); const LoopId depLoop = depOptLoop; assert(lvlTypes[t][depLoop] == (dep).second); } Assuming that's correct, then you shouldn't store the level-type in `loopToDependencies` because it's redundant information. Or if you absolutely must store the redundant copy for some reason, then you need to verify that the level-type agrees with the one in `lvlTypes` (or if the one in `lvlTypes` is undefined, then you need to store the `dlt` parameter there too; and conversely you need to adjust the `setLevelAndType` method to also verify consistency with `loopToDependencies`). I agree that it's rather convoluted to need to say `lvlTypes[t][(lvlToLoop[t][*(loopToDependencies[i][t])])]`, but the only solution to that is to reconsider the design of all the fields of the `Merger` class. For example, if we had `lvlTypes : (TensorId, Level) -> LevelType` instead of the current `lvlTypes : (TensorId, LoopId) -> LevelType`, then you wouldn't need to use `lvlToLoop` there. Of course, you'd have to redo the rest of the `Merger` code to make that work (which may end up inserting more `loopToLevel` uses than however many `levelToLoop` uses it removes). Or a different design would be to combine `lvlToLoop` and `lvlTypes` into a single `lvlInfo : (TensorId, Level) -> (LevelType, optional<LoopId>)`; of course that would require making different changes to the rest of the `Merger` code. In any case, I think it would be wise to wait for D146693 to land before trying to make any of these changes, since the newtypes of that CL will greatly simplify the process of rearranging all these vectors.
664	This should have been named `levelToDependentLoop`, since we don't use "idx" in this file anymore because it causes too much confusion. I mentioned this in the previous CL that introduced this field, but you landed the CL without fixing it
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
228	This comment should explain how exactly the `slicedTids` differs from the other `tids`. Also, is the same tensor allowed to occur in both fields? If so, then what does that mean? and, can the same level of the same tensor occur on both of the corresponding fields?
229–233	I think it'd be better to combine these all into a single `const SmallVector<LoopLevelInfo>` where `struct LoopLevelInfo final { TensorId tid; Level lvl; bool isSlice; bool isReduced; };` —assuming it's okay to combine the original `(tids,lvls)` with the new `(slicedTids,slicedLvls,sliceReduced)` into a single vector/set. If that's not okay for some reason, then I still think it'd be good to use a single `const SmallVector<LoopSlicedLevelInfo>` field for the new stuff. Using AOS will ensure that there's the right number of all the things that should correspond, as well as keeping the corresponding things close together. Plus it'll make it easier to add additional fields in the future as needed.
230	"levels"
232	This comment is wrong for this field
240	"...do not need to actually create a sparse..."
241	"...only need to maintain the..."
242	Is this actually the full `MemRef` of coordinates for all levels? If so then it should be "minCoords" (with an "s"). Whereas if it's just a single coordinate for the given level, then it should be "minCrd".
250	Given the comment, I think this would be better named something like "isFirstSlice" or "isInitialSlice".
252	If the value is just a single coordinate, then this should also be singular.

wrengr added inline comments.Mar 24 2023, 7:02 PM

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
246	Given our discussion of boolean blindness, this assertion suggests that the two parameters should be combined into a single `std::optional<std::pair<Level, Value>>` parameter. (Albeit you'll still want to assert you didn't get a null `Value`.) If that combined parameter doesn't work, why not? The only other thing that would be consistent with the assertion is `union{ non-null-Value; struct{non-null-Value; Level}}` but that's equivalent to `struct{non-null-Value; optional<Level>}`, so if that's what you want then you should `assert(minCoord)` instead.
386–388	This description doesn't make sense to me. Do you have a design doc that explains what exactly you mean by "slice" in this context, and explains how/why "reducing `d0+d1+d2`" translates into needing the two slices you mention?
392	Either "number of constraints needed to..." or "number of constraints that are needed to..."
393	"level"
398	That should be "A[i+j] => A[i+2]" to make it clear that it dereives from "j => 2".

Peiming marked an inline comment as done.Mar 27 2023, 9:43 AM

Peiming added inline comments.

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
455	Yeah, I haven't rebase against change that introduced maxLvlRank yet.
456	No, it is not redundant. The lvlTypes current is a mapping for `(tid, loopid) => dlt`, not `(tid, level) => dlt`. I think storing a pair makes more sense than introducing a complete `(tid, level) => dlt` map here because the dlt is only required here when there are non-trivial index expressions on the level.
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
43	Sry, probably it get overlooked during rebasing
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
229–233	I agree with you on this, in fact, see my comment at L222, I will do it in a separate patch though.
250	I will change the comment, it means whether it is the initial tensor that has not yet been sliced.
386–388	Yeah, I am writing a paper on this (but still at very early stage), I will share it with you later when it is more or less complete.

rebase + address comments.

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
246	It should work, I will add a TODO here and submit the change in a separate patch.

Harbormaster completed remote builds in B222096: Diff 508772.Mar 27 2023, 1:36 PM

fix windows building errors.

fix variable's name.

Harbormaster completed remote builds in B222114: Diff 508801.Mar 27 2023, 3:05 PM

rebase.

split up complicated functions

Harbormaster completed remote builds in B222774: Diff 509708.Mar 30 2023, 10:46 AM

rebase.

Harbormaster completed remote builds in B223612: Diff 510849.Apr 4 2023, 10:50 AM

Peiming added a child revision: D147550: [mlir][sparse] implement index redution on dense level (for CSR).Apr 4 2023, 11:23 AM

aartbik added inline comments.Apr 5 2023, 9:25 PM

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
412	As a TODO?
412	Please mark such comments with a TODO, since you clearly have an idea on how to to it better. That way we can periodically grep for TODO's and fix them. Unless you think we will never do that, and then this is a note to self that should not be here.
453–454	since you added a parameter, you also need to update this comment
458	Top level comments usually apply to the block of code, I find e.g. assert(!loopToDependencies[i][t].has_value()); // must be first definition a lot more readable, since you follow it direclty with the make pair/pushback
470	The non-trivial concept is used more widely now, but it would still be nice to define this per file, or at least per first occurrence what non-trivial really means. Or perhaps we should start using more standard terminology on affine expressions?
503	tensor level with index expression on it, reads awkward how about must be a tensor level that contains a non-trivial index expression
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
680	If the slice
682	I find this block of code extremely hard to read. Any way to factor this into slightly smaller methods and combine these?
821	A note "make sure" is very ambiguous. Is that a note to self, or something that the code actively does. Much better is to use an affirmative statement
822	appears first than normal tensors appears before normal tensors?
1212	I agree the else is very long and deep Why not if (!resolved) { genSlice continue; } ....
1315	Ok, this block is where all the magic happens ;-) I need to do one more careful pass over this...
1786	The next
1864	Sets (or Increment), but use one style
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
85	comments with should or must are a bit dangling unless you say what happens when this assumption is not true
138	Why did you change Exits -> exit? Original seems okay
209–210	since we have several nested structs, can you give each a short comment (as documentation, and to improve readability(
228	Here and below (and above), period at end
390	const ref?
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
1685	I would start with the same comment as in the else, and state it in the affirmative rather than the speculative // End either a for-loop or a while-loop that iterates over a slice.

address some comments.

Herald added a subscriber: bviyer. · View Herald TranscriptApr 6 2023, 2:15 PM

Harbormaster completed remote builds in B224103: Diff 511524.Apr 6 2023, 2:32 PM

simplify code.

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
682	better now?

fix rebase issues.

Harbormaster completed remote builds in B224118: Diff 511543.Apr 6 2023, 3:29 PM

fix typos.

Harbormaster completed remote builds in B224648: Diff 512264.Apr 10 2023, 2:50 PM

aartbik added inline comments.Apr 12 2023, 2:23 PM

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
458	can we make this lvl < .... part also a helper method (that way, all our asserts read almost like english ;-)
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
84	This computation does not match my mental interpretation of the text above (L79)
88	// offset adds very little, Either use a sentence or remove
270	why the empty lines here?
335	We need `depends - 1` slices to make sure you don't read depends as part of the sentence
449	The comment applies to the assert, but the declaration is in between
490	Please elaborate. "Pop out" is not at all representative for what follows
682	Yes, although it could still use a bit more doc on what each block does (on entry of each block). Also, I would not overuse the "NOTE" part, in principle, all comments are NOTEs and we should only use them when something really should jump out
767	isn't that always the case here? Should that not be part of the method description then?
1318	I think this still needs some work to make reading the block easier. The problem is that you have very concise comments in the header (Generates .....), which is okay, since i don't want to see more there, but very few comments here, where it matters. So I would still give every implementation function here an entry comment, but one that shows what is generated, using some pseudo-code of the output That way, on entry of each method, I know what to expect, and dive into the various blocks with more pre-knowledge on what they do WDYT?
1331	this one seems out of place (all others generate stuff) perhaps move it up or down in the method order (also in header)
1388	here and a few other place, no period at end, please make once last pass over all new comments here
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
85	I think what is still missing is whether it is enforced (viz. asserts fail when trying to set it) or whether clients are responsible. So, something like Clients are responsible for ensuring that the order of the returned (I think my original comment was really on when I see "should" or "must", who is to blame in the end ;-)
256	Wren can correct me if I am wrong, but I think this needs to be minimum, right (as in smallest value, and not the lowest according to some other measure)?
322	is exceeds -> exceeds but more importantly, I would state this, We break out of the loop when the coordindate exceeds the slideSize.
386	the most recent slice (singular)
394	perhaps we should discuss somewhere else, but we use "unsigned" at most places, and size_t only for local operations, or inside casts and asserts Since this is part of the API, I would prefer keeping it to unsigned, unless you have very strong reasons for this
458	as follows?
479	period at end. "to allocate top level local" makes very little sense when read in isolation. Just say what code fragment this points to
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
87	I know this was already there, but can we use override here to make it more clear that we implementing the base visitor class? Or at the very least group all overrides into a // Visitor method overrides. ... section?
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp
418–420	has the `locate` property as well

wrengr added inline comments.Apr 12 2023, 2:53 PM

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
458	Why do you need this extra assertion? The `isValidLevel` assertion already ensures that the `lvl` is valid for the tensor `t`. Therefore, rather than checking the `lvl` twice, the rest of the code should instead maintain the invariant that `levelToDependentLoop[t].size() == lvlTypes[t].size()`.

wrengr added inline comments.Apr 12 2023, 3:56 PM

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
456	There is a lot of unnecessary complexity and redundancy in storing all of: `lvlToLoop: (tid, lvl) -> optional<loopid>` `loopToLvl: (tid, loopid) -> optional<lvl>` `lvlTypes : (tid, loopid) -> dlt` `loopToDependencies : (loopid, tid) -> (lvl, dlt)` `levelToDependentLoop : (tid, lvl) -> set<loopid>` As I mentioned in my earlier comment, we can easily reconstruct the desired `(tid, lvl) -> dlt` map via `[](t, l) { auto i = lvlToLoop[t][l]; return i ? lvlTypes[t][*i] : Undef; }`. Therefore, we always have that `loopToDependencies[i][t] == make_pair(l, reconstructedLvlTypes[t][l])`. Consequently: since the first part of `loopToDependencies` has that every `(t,i)` pair determines `l`, it is trivial to construct the required `(t,l)` pair for passing to `reconstructedLvlTypes`; and since it's trivial to define `reconstructedLvlTypes`, therefore there is no benefit to storing this redundant information. And as I said before, whenever we store redundant information that means we must also therefore take pains to ensure that all the copies of that information remain consistent. I agree that it would be nice to store the `(tid, lvl) -> dlt` map directly, and to use that in lieu of the current `(tid, loopid) -> dlt` map. Especially since the former can be quickly constructed from the types of the tensors, and doesn't require knowing anything about `lvlToLoop`/`loopToLvl`. However, regardless of which one we store, the point remains the same: there's no benefit to `loopToDependencies` storing this information redundantly, and if it stores redundant information anyways then it needs to ensure that it remains consistent with the `(tid, {lvl,loopid}) -> dlt` map.

address comments.

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
458	I found this is actually a redundant check, if the lvl is valid then it is definitely inbound.
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
87	I added a comment, this is non-vritual function, so I did not use override here.

Peiming marked 2 inline comments as done.Apr 12 2023, 5:38 PM

Peiming added inline comments.

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
456	I agree that we can probably clean it up, but I will address this in separate patches through ;-)

aartbik accepted this revision.Apr 12 2023, 5:55 PM

aartbik added inline comments.

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
1571	I first though this was commented out code ;-) So make it Generate: code

This revision is now accepted and ready to land.Apr 12 2023, 5:55 PM

Harbormaster completed remote builds in B225219: Diff 513023.Apr 12 2023, 6:00 PM

address comment.

Harbormaster completed remote builds in B225240: Diff 513049.Apr 12 2023, 8:23 PM

fix test case memory leakage.

This revision was landed with ongoing or failed builds.Apr 12 2023, 8:29 PM

Closed by commit rG5fd9d801350d: [mlir][sparse] extend loop emitter to emit slice driven loops (authored by Peiming). · Explain Why

This revision was automatically updated to reflect the committed changes.

Peiming added a commit: rG5fd9d801350d: [mlir][sparse] extend loop emitter to emit slice driven loops.

Harbormaster completed remote builds in B225243: Diff 513052.Apr 12 2023, 8:45 PM

Peiming mentioned this in D148565: [mlir][sparse] group tensor id and levels into pairs in loop emitter.Apr 17 2023, 1:10 PM

Peiming mentioned this in rG36c95ee739c0: [mlir][sparse] group tensor id and levels into pairs in loop emitter.May 4 2023, 9:15 AM

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

In D142930#4319290, @vzakhari wrote:

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

In D142930#4319294, @Peiming wrote:

In D142930#4319290, @vzakhari wrote:

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Yes, sorry, I posted it in the wrong diff. The failures started with D148565.

@vzakhari this is not the patch that triggers the complaining. If you are going to revert, please make sure you revert the right one, which is https://reviews.llvm.org/D148565

In D142930#4319326, @vzakhari wrote:

In D142930#4319294, @Peiming wrote:

In D142930#4319290, @vzakhari wrote:

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Yes, sorry, I posted it in the wrong diff. The failures started with D148565.

I can not see any related file in D148565 either....

@vzakhari I do not think my patch caused the error, see https://lab.llvm.org/buildbot/#/builders/160/builds/19161, there was already the same warning (but I do not know why it was not treated as errors).

In D142930#4319341, @Peiming wrote:

In D142930#4319326, @vzakhari wrote:

In D142930#4319294, @Peiming wrote:

In D142930#4319290, @vzakhari wrote:

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Yes, sorry, I posted it in the wrong diff. The failures started with D148565.

I can not see any related file in D148565 either....

let me explain what I see: I looked at https://lab.llvm.org/buildbot/#/builders/160 and the first failing build was 19162: https://lab.llvm.org/buildbot/#/builders/160/builds/19162; it points to D148565. The build issue is shown in stdio section:

FAILED: tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o 
/usr/local/bin/c++ -DGTEST_HAS_RTTI=0 -DMLIR_CUDA_CONVERSIONS_ENABLED=0 -DMLIR_ROCM_CONVERSIONS_ENABLED=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/tools/mlir/lib/Dialect/SparseTensor/Transforms -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/mlir/lib/Dialect/SparseTensor/Transforms -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/include -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/llvm/include -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/tools/mlir/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o -MF tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o.d -o tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
In file included from ../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp:17:
../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h: In member function ‘constexpr mlir::sparse_tensor::TensorLevel mlir::sparse_tensor::LoopEmitter::makeTensorLevel(mlir::sparse_tensor::TensorId, mlir::sparse_tensor::Level) const’:
../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h:199:29: error: call to non-‘constexpr’ function ‘unsigned int mlir::sparse_tensor::LoopEmitter::getNumTensors() const’
  199 |     return l * getNumTensors() + t;
      |                ~~~~~~~~~~~~~^~
../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h:195:12: note: ‘unsigned int mlir::sparse_tensor::LoopEmitter::getNumTensors() const’ declared here
  195 |   unsigned getNumTensors() const { return tensors.size(); }
      |            ^~~~~~~~~~~~~
90.309 [1754/1/4349] Linking CXX shared library lib/libclang-cpp.so.17git
ninja: build stopped: subcommand failed.

So it does point to the change from D148565.

Thanks! Now I see it! https://reviews.llvm.org/D149874 should fix it.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

SparseTensor/

Utils/

Merger.h

49 lines

lib/

Dialect/

SparseTensor/

Transforms/

LoopEmitter.h

171 lines

LoopEmitter.cpp

935 lines

SparseTensorRewriting.cpp

2 lines

Sparsification.cpp

41 lines

Utils/

Merger.cpp

53 lines

test/

Dialect/

SparseTensor/

sparse_conv_2d_slice_based.mlir

284 lines

Integration/

Dialect/

SparseTensor/

CPU/

sparse_conv_2d_slice_based.mlir

81 lines

sparse_conv_3d_slice_based.mlir

97 lines

Diff 509708

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h

Show First 20 Lines • Show All 403 Lines • ▼ Show 20 Lines	public:

/// Sets the level number and level-type of the `t`th tensor on		/// Sets the level number and level-type of the `t`th tensor on
/// `i`th loop.		/// `i`th loop.
void setLevelAndType(TensorId t, LoopId i, Level lvl, DimLevelType dlt) {		void setLevelAndType(TensorId t, LoopId i, Level lvl, DimLevelType dlt) {
assert(isValidLevel(t, lvl) && isValidLoopId(i) && isValidDLT(dlt));		assert(isValidLevel(t, lvl) && isValidLoopId(i) && isValidDLT(dlt));
lvlTypes[t][i] = dlt;		lvlTypes[t][i] = dlt;
loopToLvl[t][i] = lvl;		loopToLvl[t][i] = lvl;
lvlToLoop[t][lvl] = i;		lvlToLoop[t][lvl] = i;
		// Maybe we should favor a constant loop bound when there are multiple
		aartbikUnsubmitted Done Reply Inline Actions As a TODO? aartbik: As a TODO?
		wrengrUnsubmitted Done Reply Inline Actions What do you mean here? I'm guessing this should be "level", but I'm not sure what you mean by "favor(ing) constant levels" wrengr: What do you mean here? I'm guessing this should be "level", but I'm not sure what you mean by…
		PeimingAuthorUnsubmitted Done Reply Inline Actions I mean to pick the a static known dimension size instead of dynamic ones if there are multiple candidate. (i.e., `DimOp` folds to constant value). It might lead to a slightly better code. Peiming: I mean to pick the a static known dimension size instead of dynamic ones if there are multiple…
		aartbikUnsubmitted Done Reply Inline Actions Please mark such comments with a TODO, since you clearly have an idea on how to to it better. That way we can periodically grep for TODO's and fix them. Unless you think we will never do that, and then this is a note to self that should not be here. aartbik: Please mark such comments with a TODO, since you clearly have an idea on how to to it better.
		// choices.
		loopBounds[i] = std::make_pair(t, lvl);
}		}

using ForeachTensorLoopIdCallback = function_ref<void(		using ForeachTensorLoopIdCallback = function_ref<void(
TensorLoopId, TensorId, std::optional<Level>, DimLevelType, bool)>;		TensorLoopId, TensorId, std::optional<Level>, DimLevelType, bool)>;

/// Iterates over a set of `TensorLoopId`s, invoking the callback		/// Iterates over a set of `TensorLoopId`s, invoking the callback
/// for each `TensorLoopId` and passing it the corresponding tensor		/// for each `TensorLoopId` and passing it the corresponding tensor
/// identifier, level, and level-type, following with a boolean value		/// identifier, level, and level-type, following with a boolean value
Show All 11 Lines	void foreachTensorLoopId(LatPointId p, bool simple,
for (const TensorLoopId b : bits.set_bits()) {		for (const TensorLoopId b : bits.set_bits()) {
const TensorId t = tensor(b);		const TensorId t = tensor(b);
const auto optLvl = getLvl(b);		const auto optLvl = getLvl(b);
const auto lvlTp = getDimLevelType(b);		const auto lvlTp = getDimLevelType(b);
if (isLvlWithNonTrivialIdxExp(b)) {		if (isLvlWithNonTrivialIdxExp(b)) {
// This must be an undefined level.		// This must be an undefined level.
assert(!optLvl.has_value());		assert(!optLvl.has_value());
// Slice the tid along the dependent level to iterate current loop.		// Slice the tid along the dependent level to iterate current loop.
callback(b, t, loopToDependencies[loop(b)][t], lvlTp,		callback(b, t, getLoopDependentLevel(b), lvlTp,
/isIdxReduc=/true);		/isIdxReduc=/true);
} else {		} else {
callback(b, t, optLvl, lvlTp, /isIdxReduc=/false);		callback(b, t, optLvl, lvlTp, /isIdxReduc=/false);
}		}
}		}
}		}

/// Sets whether the output tensor is sparse or not.		/// Sets whether the output tensor is sparse or not.
void setHasSparseOut(bool s) { hasSparseOut = s; }		void setHasSparseOut(bool s) { hasSparseOut = s; }

/// Establishes the two-way map that i <-> <t, lvl>.		/// Establishes the two-way map that i <-> <t, lvl>.
void setLoopDependentTensorLevel(LoopId i, TensorId t, Level lvl) {		void setLoopDependentTensorLevel(LoopId i, TensorId t, Level lvl,
		aartbikUnsubmitted Done Reply Inline Actions since you added a parameter, you also need to update this comment aartbik: since you added a parameter, you also need to update this comment
assert(isValidLoopId(i) && isValidLevel(t, lvl));		DimLevelType dlt) {
		wrengrUnsubmitted Done Reply Inline Actions That's the wrong bound to assert; you should instead compare against `levelToDependentIdx[t].size()` or `maxLvlRank`. For correctness you should also `assert(i < numLoops && t < numTensors)`. If you rebase over D146684 to get the assertion helpers, then the full assertion would be `assert(isValidLevel(t, lvl) && isValidLoopId(i) && !loopToDependencies[i][t].has_value())`. The nice thing about those assertion helpers is that it gives the correct bound for the level (rather than falling back to `maxLvlRank`), and also guards against the case where `i == kInvalidId`. wrengr: That's the wrong bound to assert; you should instead compare against `levelToDependentIdx[t].
		PeimingAuthorUnsubmitted Done Reply Inline Actions Yeah, I haven't rebase against change that introduced maxLvlRank yet. Peiming: Yeah, I haven't rebase against change that introduced maxLvlRank yet.
loopToDependencies[i][t] = lvl;		assert(isValidLoopId(i) && isValidLevel(t, lvl) &&
		wrengrUnsubmitted Done Reply Inline Actions Isn't it redundant to store the level-type in `loopToDependencies`, since it's already stored in `lvlTypes`? That is, afaict the following code snippet should always return successfully (assuming `i` and `t` are valid): if (const auto dep = loopToDependencies[i][t]) { const Level depLvl = (dep).first; const auto depOptLoop = lvlToLoop[t][depLvl]; assert(depOptLoop); const LoopId depLoop = depOptLoop; assert(lvlTypes[t][depLoop] == (dep).second); } Assuming that's correct, then you shouldn't store the level-type in `loopToDependencies` because it's redundant information. Or if you absolutely must store the redundant copy for some reason, then you need to verify that the level-type agrees with the one in `lvlTypes` (or if the one in `lvlTypes` is undefined, then you need to store the `dlt` parameter there too; and conversely you need to adjust the `setLevelAndType` method to also verify consistency with `loopToDependencies`). I agree that it's rather convoluted to need to say `lvlTypes[t][(lvlToLoop[t][(loopToDependencies[i][t])])]`, but the only solution to that is to reconsider the design of all the fields of the `Merger` class. For example, if we had `lvlTypes : (TensorId, Level) -> LevelType` instead of the current `lvlTypes : (TensorId, LoopId) -> LevelType`, then you wouldn't need to use `lvlToLoop` there. Of course, you'd have to redo the rest of the `Merger` code to make that work (which may end up inserting more `loopToLevel` uses than however many `levelToLoop` uses it removes). Or a different design would be to combine `lvlToLoop` and `lvlTypes` into a single `lvlInfo : (TensorId, Level) -> (LevelType, optional<LoopId>)`; of course that would require making different changes to the rest of the `Merger` code. In any case, I think it would be wise to wait for D146693 to land before trying to make any of these changes, since the newtypes of that CL will greatly simplify the process of rearranging all these vectors. wrengr:* Isn't it redundant to store the level-type in `loopToDependencies`, since it's already stored…
		PeimingAuthorUnsubmitted Done Reply Inline Actions No, it is not redundant. The lvlTypes current is a mapping for `(tid, loopid) => dlt`, not `(tid, level) => dlt`. I think storing a pair makes more sense than introducing a complete `(tid, level) => dlt` map here because the dlt is only required here when there are non-trivial index expressions on the level. Peiming: No, it is not redundant. The lvlTypes current is a mapping for `(tid, loopid) => dlt`, not `…
		wrengrUnsubmitted Done Reply Inline Actions There is a lot of unnecessary complexity and redundancy in storing all of: `lvlToLoop: (tid, lvl) -> optional<loopid>` `loopToLvl: (tid, loopid) -> optional<lvl>` `lvlTypes : (tid, loopid) -> dlt` `loopToDependencies : (loopid, tid) -> (lvl, dlt)` `levelToDependentLoop : (tid, lvl) -> set<loopid>` As I mentioned in my earlier comment, we can easily reconstruct the desired `(tid, lvl) -> dlt` map via `[](t, l) { auto i = lvlToLoop[t][l]; return i ? lvlTypes[t][i] : Undef; }`. Therefore, we always have that `loopToDependencies[i][t] == make_pair(l, reconstructedLvlTypes[t][l])`. Consequently: since the first part of `loopToDependencies` has that every `(t,i)` pair determines `l`, it is trivial to construct the required `(t,l)` pair for passing to `reconstructedLvlTypes`; and since it's trivial to define `reconstructedLvlTypes`, therefore there is no benefit to storing this redundant information. And as I said before, whenever we store redundant information that means we must also therefore take pains to ensure that all the copies of that information remain consistent. I agree that it would be nice to store the `(tid, lvl) -> dlt` map directly, and to use that in lieu of the current `(tid, loopid) -> dlt` map. Especially since the former can be quickly constructed from the types of the tensors, and doesn't require knowing anything about `lvlToLoop`/`loopToLvl`. However, regardless of which one we store, the point remains the same: there's no benefit to `loopToDependencies` storing this information redundantly, and if it stores redundant information anyways then it needs to ensure that it remains consistent with the `(tid, {lvl,loopid}) -> dlt` map. wrengr:* There is a lot of unnecessary complexity and redundancy in storing all of: - `lvlToLoop…
		PeimingAuthorUnsubmitted Done Reply Inline Actions I agree that we can probably clean it up, but I will address this in separate patches through ;-) Peiming: I agree that we can probably clean it up, but I will address this in separate patches through…
levelToDependentIdx[t][lvl].push_back(i);		lvl < levelToDependentLoop[t].size());
		// Must be the first time we define it.
		aartbikUnsubmitted Done Reply Inline Actions Top level comments usually apply to the block of code, I find e.g. assert(!loopToDependencies[i][t].has_value()); // must be first definition a lot more readable, since you follow it direclty with the make pair/pushback aartbik: Top level comments usually apply to the block of code, I find e.g. assert(!loopToDependencies…
		aartbikUnsubmitted Done Reply Inline Actions can we make this lvl < .... part also a helper method (that way, all our asserts read almost like english ;-) aartbik: can we make this lvl < .... part also a helper method (that way, all our asserts read almost…
		PeimingAuthorUnsubmitted Done Reply Inline Actions I found this is actually a redundant check, if the lvl is valid then it is definitely inbound. Peiming: I found this is actually a redundant check, if the lvl is valid then it is definitely inbound.
		wrengrUnsubmitted Done Reply Inline Actions Why do you need this extra assertion? The `isValidLevel` assertion already ensures that the `lvl` is valid for the tensor `t`. Therefore, rather than checking the `lvl` twice, the rest of the code should instead maintain the invariant that `levelToDependentLoop[t].size() == lvlTypes[t].size()`. wrengr: Why do you need this extra assertion? The `isValidLevel` assertion already ensures that the…
		assert(!loopToDependencies[i][t].has_value());
		loopToDependencies[i][t] = std::make_pair(lvl, dlt);
		levelToDependentLoop[t][lvl].push_back(i);
}		}

/// Whether the loop has dependent slice.		/// Whether the loop has dependent slice.
bool hasDependentLvl(LoopId i, TensorId t) {		bool hasDependentLvl(LoopId i, TensorId t) {
assert(isValidTensorId(t) && isValidLoopId(i));		assert(isValidTensorId(t) && isValidLoopId(i));
return loopToDependencies[i][t].has_value();		return loopToDependencies[i][t].has_value();
}		}

/// Returns the list of loop indices which appear in the non-trivial index		/// Returns the list of loop indices which appear in the non-trivial index
		aartbikUnsubmitted Not Done Reply Inline Actions The non-trivial concept is used more widely now, but it would still be nice to define this per file, or at least per first occurrence what non-trivial really means. Or perhaps we should start using more standard terminology on affine expressions? aartbik: The non-trivial concept is used more widely now, but it would still be nice to define this per…
/// expression on t_l, e.g., A[i+j] => {i, j}		/// expression on t_l, e.g., A[i+j] => {i, j}
std::vector<LoopId> &getDependentLoops(TensorId t, Level lvl) {		std::vector<LoopId> &getDependentLoops(TensorId t, Level lvl) {
assert(isValidLevel(t, lvl));		assert(isValidLevel(t, lvl));
return levelToDependentIdx[t][lvl];		return levelToDependentLoop[t][lvl];
}		}

/// Returns the defining [tid, lvl] for the loop.		/// Returns the defining [tid, lvl] for the loop.
std::pair<TensorId, Level> getLoopDefiningLvl(LoopId i) const {		std::pair<TensorId, Level> getLoopDefiningLvl(LoopId i) const {
assert(isValidLoopId(i));		assert(isValidLoopId(i));
return loopBounds[i];		return loopBounds[i];
}		}

/// Checks whether the TensorLoopId represents a tensor level with		/// Checks whether the TensorLoopId represents a tensor level with
/// non-trivial index expression on it.		/// non-trivial index expression on it.
bool isLvlWithNonTrivialIdxExp(TensorLoopId b) const {		bool isLvlWithNonTrivialIdxExp(TensorLoopId b) const {
const TensorId t = tensor(b);		const TensorId t = tensor(b);
const LoopId i = loop(b);		const LoopId i = loop(b);
assert(isValidTensorId(t) && isValidLoopId(i));		assert(isValidTensorId(t) && isValidLoopId(i));
return loopToDependencies[i][t].has_value();		return loopToDependencies[i][t].has_value();
}		}

		Level getLoopDependentLevel(TensorLoopId b) const {
		assert(isLvlWithNonTrivialIdxExp(b));
		return loopToDependencies[loop(b)][tensor(b)]->first;
		}

		DimLevelType getLoopDependentLevelType(TensorLoopId b) const {
		assert(isLvlWithNonTrivialIdxExp(b));
		return loopToDependencies[loop(b)][tensor(b)]->second;
		}

		/// Checks whether the TensorLoopId represents a tensor level with
		/// non-trivial index expression on it.
		aartbikUnsubmitted Done Reply Inline Actions tensor level with index expression on it, reads awkward how about must be a tensor level that contains a non-trivial index expression aartbik: tensor level with index expression on it, reads awkward how about must be a tensor level that…
		bool isSparseLvlWithNonTrivialIdxExp(TensorLoopId b) const {
		if (isLvlWithNonTrivialIdxExp(b)) {
		auto dlt = getLoopDependentLevelType(b);
		return isCompressedDLT(dlt) \|\| isSingletonDLT(dlt);
		}
		return false;
		}

/// Convenience getters to immediately access the stored nodes.		/// Convenience getters to immediately access the stored nodes.
/// These methods return `const&` because the underlying objects must		/// These methods return `const&` because the underlying objects must
/// not be mutated by client code. The only exception is for mutating		/// not be mutated by client code. The only exception is for mutating
/// the value associated with an expression, for which there are		/// the value associated with an expression, for which there are
/// dedicated methods below.		/// dedicated methods below.
///		///
/// NOTE: It is inadvisable to keep the reference alive for a long		/// NOTE: It is inadvisable to keep the reference alive for a long
/// time (e.g., as in `TensorExpr &te = merger.exp(e)`), since insertions		/// time (e.g., as in `TensorExpr &te = merger.exp(e)`), since insertions
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	private:
std::vector<std::vector<std::optional<Level>>> loopToLvl;		std::vector<std::vector<std::optional<Level>>> loopToLvl;

// Map that converts pair<TensorId, Level> to the corresponding LoopId.		// Map that converts pair<TensorId, Level> to the corresponding LoopId.
std::vector<std::vector<std::optional<LoopId>>> lvlToLoop;		std::vector<std::vector<std::optional<LoopId>>> lvlToLoop;

// Map from a loop to its dependencies if any.		// Map from a loop to its dependencies if any.
// The dependencies of a loop is a set of (tensor, level) pairs.		// The dependencies of a loop is a set of (tensor, level) pairs.
// It is currently only set for non-trivial index expressions.		// It is currently only set for non-trivial index expressions.
// E.g., A[i+j] => i and j will have dependencies {A0} to indicate that		// E.g., A[i+j] => i and j will have dependencies {A0, dlt(A0)} to indicate
// i and j are used in the non-trivial index expression on A0.		// that i and j are used in the non-trivial index expression on A0.
std::vector<std::vector<std::optional<Level>>> loopToDependencies;		std::vector<std::vector<std::optional<std::pair<Level, DimLevelType>>>>
		loopToDependencies;

// The inverse map of ldxToDependencies from tensor level -> dependent loop		// The inverse map of ldxToDependencies from tensor level -> dependent loop
// E.g., A[i+j], we have A0 => {i, j}, to indicate that A0 uses both {i, j}		// E.g., A[i+j], we have A0 => {i, j}, to indicate that A0 uses both {i, j}
// to compute its indices.		// to compute its indices.
std::vector<std::vector<std::vector<LoopId>>> levelToDependentIdx;		std::vector<std::vector<std::vector<LoopId>>> levelToDependentLoop;
		wrengrUnsubmitted Done Reply Inline Actions This should have been named `levelToDependentLoop`, since we don't use "idx" in this file anymore because it causes too much confusion. I mentioned this in the previous CL that introduced this field, but you landed the CL without fixing it wrengr: This should have been named `levelToDependentLoop`, since we don't use "idx" in this file…

// Map from a loop to the [tid, lvl] pair that defines the loop boundary.		// Map from a loop to the [tid, lvl] pair that defines the loop boundary.
std::vector<std::pair<TensorId, Level>> loopBounds;		std::vector<std::pair<TensorId, Level>> loopBounds;

llvm::SmallVector<TensorExp> tensorExps;		llvm::SmallVector<TensorExp> tensorExps;
llvm::SmallVector<LatPoint> latPoints;		llvm::SmallVector<LatPoint> latPoints;
llvm::SmallVector<SmallVector<LatPointId>> latSets;		llvm::SmallVector<SmallVector<LatPointId>> latSets;
};		};

} // namespace sparse_tensor		} // namespace sparse_tensor
} // namespace mlir		} // namespace mlir

#endif // MLIR_DIALECT_SPARSETENSOR_UTILS_MERGER_H_		#endif // MLIR_DIALECT_SPARSETENSOR_UTILS_MERGER_H_

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	public:
using OutputUpdater = function_ref<Value(OpBuilder &builder, Location loc,		using OutputUpdater = function_ref<Value(OpBuilder &builder, Location loc,
Value memref, Value tensor)>;		Value memref, Value tensor)>;
// Map from [tid, dim] to a list of dependent [tid, dim] for affine expression		// Map from [tid, dim] to a list of dependent [tid, dim] for affine expression
// index on sparse tensors.		// index on sparse tensors.
// E.g., for affine index (d0 + d1), it depends on two [tid, dim] that defines		// E.g., for affine index (d0 + d1), it depends on two [tid, dim] that defines
// d0 and d1 (for affine expression reduction).		// d0 and d1 (for affine expression reduction).
// If the list is empty, it means that there is no affine expression on the		// If the list is empty, it means that there is no affine expression on the
// input [tid, dim].		// input [tid, dim].
		// NOTE: the order of the returned list should be consistent with the
		aartbikUnsubmitted Done Reply Inline Actions comments with should or must are a bit dangling unless you say what happens when this assumption is not true aartbik: comments with should or must are a bit dangling unless you say what happens when this…
		aartbikUnsubmitted Done Reply Inline Actions I think what is still missing is whether it is enforced (viz. asserts fail when trying to set it) or whether clients are responsible. So, something like Clients are responsible for ensuring that the order of the returned (I think my original comment was really on when I see "should" or "must", who is to blame in the end ;-) aartbik: I think what is still missing is whether it is enforced (viz. asserts fail when trying to set…
		// topological order of the iteration graph.
using DependentLvlGetter =		using DependentLvlGetter =
function_ref<std::vector<std::pair<TensorId, Level>>(TensorId, Level)>;		function_ref<std::vector<std::pair<TensorId, Level>>(TensorId, Level)>;

LoopEmitter() = default;		LoopEmitter() = default;

/// Takes an array of input tensors, which the generated loops will		/// Takes an array of input tensors, which the generated loops will
/// iterate over. Each tensor is given a `TensorId` (numerically equal		/// iterate over. Each tensor is given a `TensorId` (numerically equal
/// to the position of that tensor `Value` in the array). Setting		/// to the position of that tensor `Value` in the array). Setting
Show All 35 Lines	public:
/// for (i = p0; i < end; i++)		/// for (i = p0; i < end; i++)
/// ...		/// ...
/// // loop sequence end.		/// // loop sequence end.
/// }		/// }
void enterNewLoopSeq(OpBuilder &builder, Location loc,		void enterNewLoopSeq(OpBuilder &builder, Location loc,
ArrayRef<TensorId> tids, ArrayRef<Level> lvls);		ArrayRef<TensorId> tids, ArrayRef<Level> lvls);

/// Exits the current loop sequence, this will reset universal index to 0.		/// Exits the current loop sequence, this will reset universal index to 0.
void exitCurrentLoopSeq() {		void exitCurrentLoopSeq(OpBuilder &builder, Location loc);
		aartbikUnsubmitted Done Reply Inline Actions Why did you change Exits -> exit? Original seems okay aartbik: Why did you change Exits -> exit? Original seems okay
		wrengrUnsubmitted Done Reply Inline Actions This should remain "/// Exits" wrengr: This should remain "/// Exits"
		wrengrUnsubmitted Done Reply Inline Actions You want to keep the triple-slash "///", since that's what the tooling uses for generating the API documentation wrengr: You want to keep the triple-slash "///", since that's what the tooling uses for generating the…
assert(loopSeqStack.size() == loopStack.size() + 1);
loopSeqStack.pop_back();
}

// TODO: Get rid of `lvls` in the argument list? Track the level we		// TODO: Get rid of `lvls` in the argument list? Track the level we
// are currently at internally. Then it would be enterNextLvlForTensor.		// are currently at internally. Then it would be enterNextLvlForTensor.
// Still need a way to specify the lvl for non-annotated tensors though,		// Still need a way to specify the lvl for non-annotated tensors though,
// as those can be accessed out of order.		// as those can be accessed out of order.
//		//
/// Emits loop over tensor_tid_lvl, it assumes that loops between		/// Emits loop over tensor_tid_lvl, it assumes that loops between
/// tensor_tid_[0, lvl - 1] have already been generated.		/// tensor_tid_[0, lvl - 1] have already been generated.
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	const std::vector<std::vector<Value>> &getCoordinateBuffers() const {
return coordinatesBuffers;		return coordinatesBuffers;
};		};
const std::vector<Value> &getValBuffer() const { return valBuffer; };		const std::vector<Value> &getValBuffer() const { return valBuffer; };

constexpr static llvm::StringLiteral getLoopEmitterLoopAttrName() {		constexpr static llvm::StringLiteral getLoopEmitterLoopAttrName() {
return llvm::StringLiteral("Emitted from");		return llvm::StringLiteral("Emitted from");
}		}

private:		private:
struct LoopInfo {		struct LoopInfo final {
		aartbikUnsubmitted Done Reply Inline Actions since we have several nested structs, can you give each a short comment (as documentation, and to improve readability( aartbik: since we have several nested structs, can you give each a short comment (as documentation, and…
LoopInfo(ArrayRef<TensorId> tids, ArrayRef<Level> lvls, Operation *loop,		LoopInfo(ArrayRef<TensorId> tids, ArrayRef<Level> lvls,
Block *userBlock, Value iv, StringAttr loopTag)		ArrayRef<TensorId> slicedTids, ArrayRef<Level> slicedLvls,
: tids(tids), lvls(lvls), loop(loop), userCodeBlock(userBlock), iv(iv) {		ArrayRef<bool> sliceReduced, Operation loop, Block userBlock,
		Value iv, StringAttr loopTag)
		: tids(tids), lvls(lvls), slicedTids(slicedTids),
		slicedLvls(slicedLvls), sliceReduced(sliceReduced), loop(loop),
		userCodeBlock(userBlock), iv(iv) {
// Attached a special tag to loop emitter generated loop.		// Attached a special tag to loop emitter generated loop.
if (loopTag)		if (loopTag)
loop->setAttr(LoopEmitter::getLoopEmitterLoopAttrName(), loopTag);		loop->setAttr(LoopEmitter::getLoopEmitterLoopAttrName(), loopTag);
}		}
// TODO: maybe use a vector<pair> for tid and lvl?		// TODO: maybe use a vector<pair> for tid and lvl?
// (Better yet, compress them together a la `TensorLoopId`.)		// (Better yet, compress them together a la `TensorLoopId`.)
// The set of tensors that the loop is operating on		// The set of tensors that the loop is operating on
const llvm::SmallVector<TensorId> tids;		const llvm::SmallVector<TensorId> tids;
// The corresponding levels for the tensors		// The corresponding levels for the tensors
const llvm::SmallVector<Level> lvls;		const llvm::SmallVector<Level> lvls;
		// The set of tensors for slice-driven loop conditions.
		aartbikUnsubmitted Done Reply Inline Actions Here and below (and above), period at end aartbik: Here and below (and above), period at end
		wrengrUnsubmitted Done Reply Inline Actions This comment should explain how exactly the `slicedTids` differs from the other `tids`. Also, is the same tensor allowed to occur in both fields? If so, then what does that mean? and, can the same level of the same tensor occur on both of the corresponding fields? wrengr: This comment should explain how exactly the `slicedTids` differs from the other `tids`. Also…
		const llvm::SmallVector<TensorId> slicedTids;
		// The corresponding level for slice-driven tensors.
		wrengrUnsubmitted Done Reply Inline Actions "levels" wrengr: "levels"
		const llvm::SmallVector<Level> slicedLvls;
		// Whether the tensor is fully reduced (e.g., i + j => j).
		wrengrUnsubmitted Done Reply Inline Actions This comment is wrong for this field wrengr: This comment is wrong for this field
		const llvm::SmallVector<bool> sliceReduced;
		wrengrUnsubmitted Done Reply Inline Actions I think it'd be better to combine these all into a single `const SmallVector<LoopLevelInfo>` where `struct LoopLevelInfo final { TensorId tid; Level lvl; bool isSlice; bool isReduced; };` —assuming it's okay to combine the original `(tids,lvls)` with the new `(slicedTids,slicedLvls,sliceReduced)` into a single vector/set. If that's not okay for some reason, then I still think it'd be good to use a single `const SmallVector<LoopSlicedLevelInfo>` field for the new stuff. Using AOS will ensure that there's the right number of all the things that should correspond, as well as keeping the corresponding things close together. Plus it'll make it easier to add additional fields in the future as needed. wrengr: I think it'd be better to combine these all into a single `const SmallVector<LoopLevelInfo>`…
		PeimingAuthorUnsubmitted Done Reply Inline Actions I agree with you on this, in fact, see my comment at L222, I will do it in a separate patch though. Peiming: I agree with you on this, in fact, see my comment at L222, I will do it in a separate patch…
const Operation *loop; // the loop operation		const Operation *loop; // the loop operation
Block *const userCodeBlock; // the block holding users' generated code.		Block *const userCodeBlock; // the block holding users' generated code.
const Value iv; // the induction variable for the loop		const Value iv; // the induction variable for the loop
};		};

/// Linearizes address for dense level (i.e., p = (i * d0) + j).		struct SliceInfo final {
		// Note that we do not need to create a actual sparse tensor slice but
		wrengrUnsubmitted Done Reply Inline Actions "...do not need to actually create a sparse..." wrengr: "...do not need to actually create a sparse..."
		// instead only need to maintain the metadata of the slice.
		wrengrUnsubmitted Done Reply Inline Actions "...only need to maintain the..." wrengr: "...only need to maintain the..."
		SliceInfo(Value minCrd, Value offset, Value isNonEmpty,
		wrengrUnsubmitted Done Reply Inline Actions Is this actually the full `MemRef` of coordinates for all levels? If so then it should be "minCoords" (with an "s"). Whereas if it's just a single coordinate for the given level, then it should be "minCrd". wrengr: Is this actually the full `MemRef` of coordinates for all levels? If so then it should be…
		std::optional<Level> slicedOnLvl, unsigned depth)
		wrengrUnsubmitted Done Reply Inline Actions I'm guessing this should be `LoopOrd`. If not, then what is it? wrengr: I'm guessing this should be `LoopOrd`. If not, then what is it?
		: minCrd(minCrd), offset(offset), isNonEmpty(isNonEmpty),
		slicedOnLvl(slicedOnLvl), depth(depth) {
		// TODO: use std::optional<pair<Level, minCrd>>
		wrengrUnsubmitted Done Reply Inline Actions Given our discussion of boolean blindness, this assertion suggests that the two parameters should be combined into a single `std::optional<std::pair<Level, Value>>` parameter. (Albeit you'll still want to assert you didn't get a null `Value`.) If that combined parameter doesn't work, why not? The only other thing that would be consistent with the assertion is `union{ non-null-Value; struct{non-null-Value; Level}}` but that's equivalent to `struct{non-null-Value; optional<Level>}`, so if that's what you want then you should `assert(minCoord)` instead. wrengr: Given our discussion of boolean blindness, this assertion suggests that the two parameters…
		PeimingAuthorUnsubmitted Done Reply Inline Actions It should work, I will add a TODO here and submit the change in a separate patch. Peiming: It should work, I will add a TODO here and submit the change in a separate patch.
		assert(!slicedOnLvl \|\| minCrd);
		}

		// Whether this is the tensor that has not yet been sliced.
		wrengrUnsubmitted Done Reply Inline Actions Given the comment, I think this would be better named something like "isFirstSlice" or "isInitialSlice". wrengr: Given the comment, I think this would be better named something like "isFirstSlice" or…
		PeimingAuthorUnsubmitted Done Reply Inline Actions I will change the comment, it means whether it is the initial tensor that has not yet been sliced. Peiming: I will change the comment, it means whether it is the initial tensor that has not yet been…
		bool isInitialTensor() const { return !slicedOnLvl.has_value(); }

		wrengrUnsubmitted Done Reply Inline Actions If the value is just a single coordinate, then this should also be singular. wrengr: If the value is just a single coordinate, then this should also be singular.
		Value minCrd; // the minimal coordinate of the slice.
		Value offset; // the offset of the current slice.
		Value isNonEmpty; // whether the slice is empty.
		std::optional<Level> slicedOnLvl; // the level on which the slice is done
		aartbikUnsubmitted Done Reply Inline Actions Wren can correct me if I am wrong, but I think this needs to be minimum, right (as in smallest value, and not the lowest according to some other measure)? aartbik: Wren can correct me if I am wrong, but I think this needs to be minimum, right (as in smallest…
		unsigned depth; // the depth (relative to dependentDimMap[tid][lvl]).
		};

		/// Linearizes address for dense dimension (i.e., p = (i * d0) + j).
		wrengrUnsubmitted Done Reply Inline Actions Can you make this class "`final`" (ditto for the `LoopInfo`). You don't need/use subclassing here, and marking it final helps the compiler generate more efficient code wrengr: Can you make this class "`final`" (ditto for the `LoopInfo`). You don't need/use subclassing…
Value genAddress(OpBuilder &builder, Location loc, TensorId tid, Level lvl,		Value genAddress(OpBuilder &builder, Location loc, TensorId tid, Level lvl,
Value iv);		Value iv);

/// Generates the segment high for a non-unique level (to fast forward		/// Generates the segment high for a non-unique level (to fast forward
/// duplicated coordinates). That is, it generates the code:		/// duplicated coordinates). That is, it generates the code:
///		///
/// crd = coordinates_tid_lvl[pos]		/// crd = coordinates_tid_lvl[pos]
/// while (pos < pHi && coordinates_tid_lvl[pos] == crd)		/// while (pos < pHi && coordinates_tid_lvl[pos] == crd)
Show All 37 Lines	private:

/// Emits extra locals, since the locals might not be in simplified lattices		/// Emits extra locals, since the locals might not be in simplified lattices
/// point used to generate the loops, but are still required to generate		/// point used to generate the loops, but are still required to generate
/// expressions.		/// expressions.
void emitExtraLocalsForTensorsAtDenseLvls(OpBuilder &builder, Location loc,		void emitExtraLocalsForTensorsAtDenseLvls(OpBuilder &builder, Location loc,
ArrayRef<TensorId> tids,		ArrayRef<TensorId> tids,
ArrayRef<Level> lvls);		ArrayRef<Level> lvls);

		Operation *emitForLoopOverTensorAtLvl(OpBuilder &builder, Location loc,
		TensorId tid, Level lvl,
		wrengrUnsubmitted Done Reply Inline Actions This should be `Level lvl` (I'm pretty sure) wrengr: This should be `Level lvl` (I'm pretty sure)
		MutableArrayRef<Value> reduc,
		bool isParallel);

/// Exits a for loop, returns the reduction results, e.g.,		/// Exits a for loop, returns the reduction results, e.g.,
/// For sequential for loops:		/// For sequential for loops:
/// %ret = for () {		/// %ret = for () {
/// ...		/// ...
		aartbikUnsubmitted Done Reply Inline Actions is exceeds -> exceeds but more importantly, I would state this, We break out of the loop when the coordindate exceeds the slideSize. aartbik: is exceeds -> exceeds but more importantly, I would state this, We break out of the loop when…
/// %val = addi %args, %c		/// %val = addi %args, %c
/// yield %val		/// yield %val
/// }		/// }
/// For parallel loops, the following generated code by users:		/// For parallel loops, the following generated code by users:
/// %ret = parallel () init(%args) {		/// %ret = parallel () init(%args) {
/// ...		/// ...
/// %val = op %args, %c		/// %val = op %args, %c
/// }		/// }
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if (const auto reassoc = collapseReassoc[tid]) {
llvm::map_range(srcLvls, [&](Attribute srcLvl) -> Level {		llvm::map_range(srcLvls, [&](Attribute srcLvl) -> Level {
// TODO: replace this with the converter for `LevelAttr`.		// TODO: replace this with the converter for `LevelAttr`.
return srcLvl.cast<IntegerAttr>().getValue().getZExtValue();		return srcLvl.cast<IntegerAttr>().getValue().getZExtValue();
}));		}));
}		}
return {dstLvl};		return {dstLvl};
}		}

		//
		// Slice-driven loop related methods.
		//

		/// Retrieves the most recent slices on lvl. To reduce affine expression like
		aartbikUnsubmitted Done Reply Inline Actions the most recent slice (singular) aartbik: the most recent slice (singular)
		/// d0 + d1 + d2, we need two slices (one of size d1 + d2, and the other of
		/// size d2). This methods returns the latter slice (of size d2), which is
		wrengrUnsubmitted Done Reply Inline Actions This description doesn't make sense to me. Do you have a design doc that explains what exactly you mean by "slice" in this context, and explains how/why "reducing `d0+d1+d2`" translates into needing the two slices you mention? wrengr: This description doesn't make sense to me. Do you have a design doc that explains what exactly…
		PeimingAuthorUnsubmitted Done Reply Inline Actions Yeah, I am writing a paper on this (but still at very early stage), I will share it with you later when it is more or less complete. Peiming: Yeah, I am writing a paper on this (but still at very early stage), I will share it with you…
		/// also the final slice on the level.
		SliceInfo &getFinalSliceOnLvl(TensorId tid, Level lvl);
		aartbikUnsubmitted Done Reply Inline Actions const ref? aartbik: const ref?

		/// Get the total number of constraints needed to fully resolve
		wrengrUnsubmitted Done Reply Inline Actions Either "number of constraints needed to..." or "number of constraints that are needed to..." wrengr: Either "number of constraints needed to..." or "number of constraints that are needed to..."
		/// dependent levels on tensor[tid].
		wrengrUnsubmitted Done Reply Inline Actions "level" wrengr: "level"
		size_t remDepOnLevel(TensorId tid, Level lvl) const;
		aartbikUnsubmitted Done Reply Inline Actions perhaps we should discuss somewhere else, but we use "unsigned" at most places, and size_t only for local operations, or inside casts and asserts Since this is part of the API, I would prefer keeping it to unsigned, unless you have very strong reasons for this aartbik: perhaps we should discuss somewhere else, but we use "unsigned" at most places, and size_t only…

		/// Whether the tid, lvl is fully reduced, i.e., the non-trivial index
		/// expression has been reduced to a trivial one.
		/// E.g., A[i+j] => A[2 + i] (j is reduced)
		wrengrUnsubmitted Done Reply Inline Actions That should be "A[i+j] => A[i+2]" to make it clear that it dereives from "j => 2". wrengr: That should be "A[i+j] => A[i+2]" to make it clear that it dereives from "j => 2".
		bool depFullyReduced(TensorId tid, Level lvl) const {
		return remDepOnLevel(tid, lvl) == 1;
		}

		/// Whether the tid, lvl is fully resolved, i.e., we entered the level already
		/// (the index on that level is determined).
		/// E.g., A[i+j] => A[2 + 3] (both i and j become invariants for inner loops).
		bool lvlFullyResolved(TensorId tid, Level lvl) const {
		return remDepOnLevel(tid, lvl) == 0;
		}

		/// Generates a whileOp to iterate over a subset of coordinates on tid on lvl
		/// using the pHi and pLo provided, the loop break on the first coordinate
		/// that exceeds the slice boundary (i.e., coord >= slice.offset +
		/// slice.size).
		std::pair<Operation *, ValueRange>
		genSliceLvlTraverseLoop(OpBuilder &builder, Location loc, Value pLo,
		Value pHi, Value offset, TensorId tid, Level lvl,
		size_t depth, ValueRange userReduc, bool genYield,
		wrengrUnsubmitted Done Reply Inline Actions Should this be `LoopOrd`? If not, then what is it? wrengr: Should this be `LoopOrd`? If not, then what is it?
		PeimingAuthorUnsubmitted Done Reply Inline Actions This is `unsigned index` to another array (not the loop sequence). I will stick with this. Peiming: This is `unsigned index` to another array (not the loop sequence). I will stick with this.
		/bodyBuilder=/
		llvm::function_ref<void(OpBuilder &, Location, Value,
		MutableArrayRef<Value>)>);

		/// Generates a nested loop that iterates over tid on all the coordinates on
		/// lvl.
		ValueRange genUnResolvedSliceTreeTraverse(
		OpBuilder &builder, Location loc, Value offset, TensorId tid, Level lvl,
		size_t depth, ValueRange userReduc,
		/bodyBody=/
		llvm::function_ref<void(OpBuilder &, Location, Value,
		MutableArrayRef<Value>)>);

		/// Generates code to get the first non-empty slice of tid on lvl, when all
		/// the previous level before `lvl` are resolved (or lvl is the first level).
		///
		/// This is the simple case because the previous level are resolved into a
		/// single node in the storage tree.
		void genResolvedSliceBegin(OpBuilder &builder, Location loc, TensorId tid,
		Level lvl);

		/// Generates code to get the first non-empty slice of tid on lvl, when
		wrengrUnsubmitted Done Reply Inline Actions What is this supposed to be: `LoopOrd`, `LoopId`, other? wrengr: What is this supposed to be: `LoopOrd`, `LoopId`, other?
		PeimingAuthorUnsubmitted Done Reply Inline Actions This is an `unsigned` counter. Peiming: This is an `unsigned` counter.
		/// the previous levels before `lvl` are unresolved
		///
		/// This is the complex case because the previous levels corresponding to a
		/// range of nodes in the storage tree.
		void genUnResolvedSliceBegin(OpBuilder &builder, Location loc, TensorId tid,
		Level lvl);

		/// Generates code to get the first non-empty slice of tid on lvl.
		/// return true if has already been resolved.
		bool genSliceBegin(OpBuilder &builder, Location loc, TensorId tid, Level lvl);

		/// Generates code to get the next non-empty slices of tid on lvl.
		void genSliceNextInduction(OpBuilder &builder, Location loc,
		const Operation *whileOp, TensorId tid, Level lvl,
		SmallVectorImpl<Value> &operands,
		unsigned &retIdx);

		/// Generates a slice-driven while loop like follows.
		///
		aartbikUnsubmitted Done Reply Inline Actions as follows? aartbik: as follows?
		/// curSlice = getFirstNonEmptySlice(tensor).
		///
		/// while(isNonEmpty) {
		/// ..user code..
		/// isNonEmpty, curSlice = getNextNonEmptySlice(curSlice)
		/// }
		Operation *emitSliceDrivenLoopOverTensorAtLvl(OpBuilder &builder,
		Location loc, TensorId tid,
		Level lvl,
		MutableArrayRef<Value> reduc);

/// A optional string attribute that should be attached to the loop		/// A optional string attribute that should be attached to the loop
/// generated by loop emitter, it might help following passes to identify		/// generated by loop emitter, it might help following passes to identify
/// loops that operates on sparse tensors more easily.		/// loops that operates on sparse tensors more easily.
StringAttr loopTag;		StringAttr loopTag;
/// Whether the loop emitter needs to treat the last tensor as the output		/// Whether the loop emitter needs to treat the last tensor as the output
/// tensor.		/// tensor.
bool hasOutput;		bool hasOutput;
bool isSparseOut;		bool isSparseOut;

		/// The insertion point to allocate top level local
		aartbikUnsubmitted Done Reply Inline Actions period at end. "to allocate top level local" makes very little sense when read in isolation. Just say what code fragment this points to aartbik: period at end. "to allocate top level local" makes very little sense when read in isolation.
		Operation *localInsertPos;

//		//
// Fields which have `numTensor` many entries.		// Fields which have `numTensor` many entries.
//		//
// TODO: switch to an AOS style to avoid any possible mismatches.		// TODO: switch to an AOS style to avoid any possible mismatches.
//		//

/// Input and (optional) output tensors.		/// Input and (optional) output tensors.
std::vector<Value> tensors;		std::vector<Value> tensors;
Show All 19 Lines	private:
// The segment upper bound for non-uniques level after de-duplication.		// The segment upper bound for non-uniques level after de-duplication.
std::vector<std::vector<Value>> segHi;		std::vector<std::vector<Value>> segHi;
std::vector<std::vector<Value>> highs;		std::vector<std::vector<Value>> highs;
std::vector<std::vector<Value>> lvlSizes;		std::vector<std::vector<Value>> lvlSizes;
std::vector<std::vector<Value>> positionsBuffers; // to_positions		std::vector<std::vector<Value>> positionsBuffers; // to_positions
std::vector<std::vector<Value>> coordinatesBuffers; // to_coordinates		std::vector<std::vector<Value>> coordinatesBuffers; // to_coordinates
std::vector<Value> valBuffer; // to_value		std::vector<Value> valBuffer; // to_value

		//
		// Slice-driven loops related fields.
		//

/// Whether the sparse input is a slice.		/// Whether the sparse input is a slice.
std::vector<bool> isSparseSlices;		std::vector<bool> isSparseSlices;
/// Values related to slices.		/// Values related to slices.
std::vector<std::vector<Value>> sliceOffsets;		std::vector<std::vector<Value>> sliceOffsets;
std::vector<std::vector<Value>> sliceStrides;		std::vector<std::vector<Value>> sliceStrides;

// Map from [tid, level] to a list of dependent [tid, level].		// Map from [tid, level] to a list of dependent [tid, level].
// See comments for `DependentDimGetter`.		// See comments for `DependentDimGetter`.
std::vector<std::vector<std::vector<std::pair<TensorId, Level>>>>		std::vector<std::vector<std::vector<std::pair<TensorId, Level>>>>
dependentLvlMap;		dependentLvlMap;

		// The cached position buffer for the slices, they serve the same purpose as
		// ptrBuffer for compressed dimensions.
		// But they always starts with the first pidx pointing to coord > slice.offset
		// to avoid iteration from the beginning.
		std::vector<std::vector<std::vector<Value>>> slicePosBuffer;

		// The cached size for each slices.
		std::vector<std::vector<std::vector<Value>>> sliceSizes;

		// The number of reduced dependencies on a tensor level so far.
		std::vector<std::vector<unsigned>> levelReducedDep;

		// sliceStack[tid] holds the generated slice stack on tid.
		std::vector<std::vector<SliceInfo>> sliceStack;

//		//
// View based reshape related-fields and methods		// View based reshape related-fields and methods
//		//

/// Collapse Reassociations related to a specific tensor		/// Collapse Reassociations related to a specific tensor
// TODO: support expand.		// TODO: support expand.
std::vector<ArrayAttr> collapseReassoc;		std::vector<ArrayAttr> collapseReassoc;

/// TODO: not yet used, it should track the current level for each tensor		/// TODO: not yet used, it should track the current level for each tensor
/// to help eliminate `lvls` paramters from above APIs.		/// to help eliminate `lvls` paramters from above APIs.
/// std::vector<Level> curLvl;		/// std::vector<Level> curLvl;

//		//
// Fields which have at most `numLoops` many entries.		// Fields which have at most `numLoops` many entries.
//		//

/// Loop Stack, stores the information of all the nested loops that are		/// Loop Stack, stores the information of all the nested loops that are
/// alive.		/// alive.
std::vector<LoopInfo> loopStack;		std::vector<LoopInfo> loopStack;

/// Loop Sequence Stack, stores the universal index for the current loop		// Loop Sequence Stack, stores the unversial index for the current loop
/// sequence.		// sequence. and a list of tids which was taken sliced.
std::vector<Value> loopSeqStack;		// TODO: maybe we should have a LoopSeqInfo
		std::vector<std::pair<Value, std::vector<std::tuple<TensorId, Level, bool>>>>
		loopSeqStack;

/// Maps `LoopId` (used by `AffineDimExpr`) to `LoopOrd` (in the `loopStack`).		/// Maps `LoopId` (used by `AffineDimExpr`) to `LoopOrd` (in the `loopStack`).
/// TODO: We should probably use a callback function here to make it more		/// TODO: We should probably use a callback function here to make it more
/// general.		/// general.
std::vector<LoopOrd> loopIdToOrd;		std::vector<LoopOrd> loopIdToOrd;
};		};

} // namespace sparse_tensor		} // namespace sparse_tensor
} // namespace mlir		} // namespace mlir

#endif // MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_SPARSETENSORLOOPEMITTER_H_		#endif // MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_SPARSETENSORLOOPEMITTER_H_

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp

Show All 19 Lines

using namespace mlir;		using namespace mlir;
using namespace mlir::sparse_tensor;		using namespace mlir::sparse_tensor;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// File local helper functions.		// File local helper functions.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Generates a position/coordinate load from the sparse storage scheme.		#define CMPI(p, l, r) \
/// Narrower data types need to be zero extended before casting the		(builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::p, l, r) \
/// value into the `Index` type used for looping and indexing.		.getResult())

		#define C_IDX(v) (constantIndex(builder, loc, v))

		/// Generates a pointer/index load from the sparse storage scheme. Narrower
		/// data types need to be zero extended before casting the value into the
		/// index type used for looping and indexing.
static Value genIndexLoad(OpBuilder &builder, Location loc, Value mem,		static Value genIndexLoad(OpBuilder &builder, Location loc, Value mem,
Value s) {		Value s) {
		wrengrUnsubmitted Done Reply Inline Actions I'd prefer these be defined as `static inline` functions rather than as macros, since that gives better type-safety and compiler error messages. If you need to use CMPI at several different types, then just use a template. wrengr: I'd prefer these be defined as `static inline` functions rather than as macros, since that…
		PeimingAuthorUnsubmitted Done Reply Inline Actions I will stick with this. The reason I use macro is that I want to avoid typing `builder` and `loc`. Peiming: I will stick with this. The reason I use macro is that I want to avoid typing `builder` and…
// For the scalar case, we simply zero extend narrower indices into 64-bit		// For the scalar case, we simply zero extend narrower indices into 64-bit
// values before casting to index without a performance penalty. Here too,		// values before casting to index without a performance penalty. Here too,
// however, indices that already are 64-bit, in theory, cannot express the		// however, indices that already are 64-bit, in theory, cannot express the
// full range as explained above.		// full range as explained above.
Value load = builder.create<memref::LoadOp>(loc, mem, s);		Value load = builder.create<memref::LoadOp>(loc, mem, s);
		wrengrUnsubmitted Done Reply Inline Actions Please don't undo this variable naming. The "mem" matches other places, and avoids confusion about whether "ptr" means `MemRef` vs the old "pointer" (now "position") vs llvm-pointers vs... wrengr: Please don't undo this variable naming. The "mem" matches other places, and avoids confusion…
		PeimingAuthorUnsubmitted Done Reply Inline Actions Sry, probably it get overlooked during rebasing Peiming: Sry, probably it get overlooked during rebasing
if (!load.getType().isa<IndexType>()) {		if (!load.getType().isa<IndexType>()) {
if (load.getType().getIntOrFloatBitWidth() < 64)		if (load.getType().getIntOrFloatBitWidth() < 64)
load = builder.create<arith::ExtUIOp>(loc, builder.getI64Type(), load);		load = builder.create<arith::ExtUIOp>(loc, builder.getI64Type(), load);
load =		load =
		wrengrUnsubmitted Done Reply Inline Actions You should just use `ValueRange::getTypes()` wrengr: You should just use `ValueRange::getTypes()`
builder.create<arith::IndexCastOp>(loc, builder.getIndexType(), load);		builder.create<arith::IndexCastOp>(loc, builder.getIndexType(), load);
}		}
return load;		return load;
}		}

static Value genSliceOffset(OpBuilder &builder, Location loc, Value tensor,		static Value genSliceOffset(OpBuilder &builder, Location loc, Value tensor,
Level lvl) {		Level lvl) {
auto enc = getSparseTensorEncoding(tensor.getType());		auto enc = getSparseTensorEncoding(tensor.getType());
Show All 15 Lines
static Value toSliceCrd(OpBuilder &builder, Location loc, Value crd,		static Value toSliceCrd(OpBuilder &builder, Location loc, Value crd,
Value offset, Value stride, Value tensor, Level lvl) {		Value offset, Value stride, Value tensor, Level lvl) {
// tensorCrd = sliceCrd * stride + offset		// tensorCrd = sliceCrd * stride + offset
crd = builder.create<arith::MulIOp>(loc, crd, stride);		crd = builder.create<arith::MulIOp>(loc, crd, stride);
crd = builder.create<arith::AddIOp>(loc, crd, offset);		crd = builder.create<arith::AddIOp>(loc, crd, offset);
return crd;		return crd;
}		}

		/// Generates code to compute the (absolute) offset of the slice based on the
		/// provide minimum coordinates in the slice. The generated code returns
		/// constant 0 if `isNonEmpty` is false.
		///
		/// offset = isNonEmpty && minCrd >= size ? minCrd - size + 1 : 0;
		static Value offsetFromMinCoord(OpBuilder &builder, Location loc, Value minCrd,
		aartbikUnsubmitted Done Reply Inline Actions This computation does not match my mental interpretation of the text above (L79) aartbik: This computation does not match my mental interpretation of the text above (L79)
		Value size, Value isNonEmpty) {
		Value geSize = CMPI(uge, minCrd, size);
		Value pred = builder.create<arith::AndIOp>(loc, isNonEmpty, geSize);
		// offset
		aartbikUnsubmitted Done Reply Inline Actions // offset adds very little, Either use a sentence or remove aartbik: // offset adds very little, Either use a sentence or remove
		Value mp1 = builder.create<arith::AddIOp>(loc, minCrd, C_IDX(1));
		Value mms = builder.create<arith::SubIOp>(loc, mp1, size);
		// This is the absolute offset related to the underly tensor.
		return builder.create<arith::SelectOp>(loc, pred, mms, C_IDX(0));
		}

/// Converts a coordinate relative to the underlying tensor to the coordinate		/// Converts a coordinate relative to the underlying tensor to the coordinate
/// relative to the slice, returns a extra reminder value		/// relative to the slice, returns a extra reminder value
// FIXME: that description says "tensorCrd -> sliceCrd"; but the function		// FIXME: that description says "tensorCrd -> sliceCrd"; but the function
// name suggests it should be "sliceCrd -> tensorCrd".		// name suggests it should be "sliceCrd -> tensorCrd".
static std::pair<Value, Value> fromSliceCrd(OpBuilder &builder, Location loc,		static std::pair<Value, Value> fromSliceCrd(OpBuilder &builder, Location loc,
Value crd, Value offset,		Value crd, Value offset,
Value stride, Value tensor,		Value stride, Value tensor,
Level lvl) {		Level lvl) {
Show All 16 Lines	LoopEmitter::genSliceLegitPredicate(OpBuilder &builder, Location loc, Value crd,
const auto [newCrd, crdRem] =		const auto [newCrd, crdRem] =
fromSliceCrd(builder, loc, crd, offset, stride, slice, lvl);		fromSliceCrd(builder, loc, crd, offset, stride, slice, lvl);

SmallVector<Value, 3> conds; // at most 3 conditions		SmallVector<Value, 3> conds; // at most 3 conditions

// First, coord >= offset (skip the check if offset is known to be 0).		// First, coord >= offset (skip the check if offset is known to be 0).
if (auto staticOffset = enc.getStaticLvlSliceOffset(lvl);		if (auto staticOffset = enc.getStaticLvlSliceOffset(lvl);
!(staticOffset.has_value() && *staticOffset == 0)) {		!(staticOffset.has_value() && *staticOffset == 0)) {
auto geOffset = builder.create<arith::CmpIOp>(		auto geOffset = CMPI(uge, crd, offset);
loc, arith::CmpIPredicate::uge, crd, offset);
conds.push_back(geOffset);		conds.push_back(geOffset);
}		}

// Second, coord_in_slice < length		// Second, coord_in_slice < length
auto ltLength = builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::ult,		auto ltLength = CMPI(ult, newCrd, lvlSizes[tid][lvl]);
newCrd, lvlSizes[tid][lvl]);
conds.push_back(ltLength);		conds.push_back(ltLength);

// Third, rem == 0 (skip the check if stride is known to be 1).		// Third, rem == 0 (skip the check if stride is known to be 1).
if (auto staticStride = enc.getStaticLvlSliceStride(lvl);		if (auto staticStride = enc.getStaticLvlSliceStride(lvl);
!(staticStride.has_value() && *staticStride == 1)) {		!(staticStride.has_value() && *staticStride == 1)) {
auto fitStride = builder.create<arith::CmpIOp>(		auto fitStride = CMPI(eq, crdRem, C_IDX(0));
loc, arith::CmpIPredicate::eq, crdRem, constantIndex(builder, loc, 0));
conds.push_back(fitStride);		conds.push_back(fitStride);
}		}

// Must meet all condition to be a valid coordinate in slice.		// Must meet all condition to be a valid coordinate in slice.
auto pred = conds.front();		auto pred = conds.front();
for (auto cond : ValueRange(conds).drop_front())		for (auto cond : ValueRange(conds).drop_front())
pred = builder.create<arith::AndIOp>(loc, pred, cond);		pred = builder.create<arith::AndIOp>(loc, pred, cond);

return {newCrd, pred};		return {newCrd, pred};
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Sparse tensor loop emitter class implementations		// Sparse tensor loop emitter class implementations
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

Value LoopEmitter::genAddress(OpBuilder &builder, Location loc, TensorId tid,		Value LoopEmitter::genAddress(OpBuilder &builder, Location loc, TensorId tid,
Level lvl, Value crd) {		Level lvl, Value crd) {
Value pos = lvl == 0 ? constantIndex(builder, loc, 0) : posits[tid][lvl - 1];		Value pos = lvl == 0 ? C_IDX(0) : posits[tid][lvl - 1];
Value mul = builder.create<arith::MulIOp>(loc, highs[tid][lvl], pos);		Value mul = builder.create<arith::MulIOp>(loc, highs[tid][lvl], pos);
if (isSparseSlices[tid])		if (isSparseSlices[tid])
crd = toSliceCrd(builder, loc, crd, sliceOffsets[tid][lvl],		crd = toSliceCrd(builder, loc, crd, sliceOffsets[tid][lvl],
sliceStrides[tid][lvl], tensors[tid], lvl);		sliceStrides[tid][lvl], tensors[tid], lvl);
Value add = builder.create<arith::AddIOp>(loc, mul, crd);		Value add = builder.create<arith::AddIOp>(loc, mul, crd);
return add;		return add;
}		}

Show All 26 Lines	auto whileOp = builder.create<scf::WhileOp>(
builder.setInsertionPointToStart(ifInBound.elseBlock());		builder.setInsertionPointToStart(ifInBound.elseBlock());
builder.create<scf::YieldOp>(loc, constantI1(builder, loc, false));		builder.create<scf::YieldOp>(loc, constantI1(builder, loc, false));
}		}
builder.create<scf::ConditionOp>(loc, ifInBound.getResults()[0], ivs);		builder.create<scf::ConditionOp>(loc, ifInBound.getResults()[0], ivs);
},		},
/afterBuilder=/		/afterBuilder=/
[](OpBuilder &builder, Location loc, ValueRange ivs) {		[](OpBuilder &builder, Location loc, ValueRange ivs) {
// pos ++		// pos ++
Value nextPos = builder.create<arith::AddIOp>(		Value nextPos = builder.create<arith::AddIOp>(loc, ivs[0], C_IDX(1));
loc, ivs[0], constantIndex(builder, loc, 1));
builder.create<scf::YieldOp>(loc, nextPos);		builder.create<scf::YieldOp>(loc, nextPos);
});		});
// Return the segment high.		// Return the segment high.
return whileOp.getResult(0);		return whileOp.getResult(0);
}		}

Value LoopEmitter::genSparseCrd(OpBuilder &builder, Location loc, TensorId tid,		Value LoopEmitter::genSparseCrd(OpBuilder &builder, Location loc, TensorId tid,
Level dstLvl) {		Level dstLvl) {
Value crd = constantIndex(builder, loc, 0);		Value crd = C_IDX(0);
const auto reassoc = getCollapseReassociation(tid, dstLvl);		const auto reassoc = getCollapseReassociation(tid, dstLvl);
const unsigned reassocSize = reassoc.size();		const unsigned reassocSize = reassoc.size();
for (unsigned i = 0; i < reassocSize; i++) {		for (unsigned i = 0; i < reassocSize; i++) {
const Level srcLvl = reassoc[i];		const Level srcLvl = reassoc[i];
// A load on the coordinates array yields the coordinate.		// A load on the coordinates array yields the coordinate.
const Value mem = coordinatesBuffers[tid][srcLvl];		const Value mem = coordinatesBuffers[tid][srcLvl];
/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.		/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.
const Value pos = posits[tid][dstLvl];		const Value pos = posits[tid][dstLvl];
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	void LoopEmitter::initialize(ValueRange ts, StringAttr loopTag, bool hasOutput,
// These zeros will be overwritten below, but we need to initialize		// These zeros will be overwritten below, but we need to initialize
// them to something since we'll need random-access assignment.		// them to something since we'll need random-access assignment.
this->loopIdToOrd.assign(numLoops, 0);		this->loopIdToOrd.assign(numLoops, 0);
this->loopStack.reserve(numLoops);		this->loopStack.reserve(numLoops);
this->loopSeqStack.reserve(numLoops);		this->loopSeqStack.reserve(numLoops);

this->dependentLvlMap.assign(		this->dependentLvlMap.assign(
numTensors, std::vector<std::vector<std::pair<TensorId, Level>>>());		numTensors, std::vector<std::vector<std::pair<TensorId, Level>>>());
		this->slicePosBuffer.assign(numTensors, std::vector<std::vector<Value>>());
		this->sliceSizes.assign(numTensors, std::vector<std::vector<Value>>());
		this->sliceStack.assign(numTensors, std::vector<SliceInfo>());

		aartbikUnsubmitted Done Reply Inline Actions why the empty lines here? aartbik: why the empty lines here?
		this->levelReducedDep.assign(numTensors, std::vector<unsigned>());

		wrengrUnsubmitted Done Reply Inline Actions You should use the `numTensors` variable instead of calling `tensors.size()` repeatedly. (This is for code clarity rather than performance reasons) wrengr: You should use the `numTensors` variable instead of calling `tensors.size()` repeatedly. (This…
// Initialize nested types of `TensorId`-indexed fields.		// Initialize nested types of `TensorId`-indexed fields.
for (TensorId tid = 0; tid < numTensors; tid++) {		for (TensorId tid = 0; tid < numTensors; tid++) {
const Value t = tensors[tid];		const Value t = tensors[tid];
// a scalar or 0-dimension tensors		// a scalar or 0-dimension tensors
if (isZeroRankedTensorOrScalar(t.getType()))		if (isZeroRankedTensorOrScalar(t.getType()))
continue;		continue;

auto rtp = getRankedTensorType(t);		auto rtp = getRankedTensorType(t);
Show All 25 Lines	for (TensorId tid = 0; tid < numTensors; tid++) {
highs[tid].assign(lvlRank, Value());		highs[tid].assign(lvlRank, Value());
segHi[tid].assign(lvlRank, Value());		segHi[tid].assign(lvlRank, Value());
posits[tid].assign(lvlRank, Value());		posits[tid].assign(lvlRank, Value());
coords[tid].assign(lvlRank, Value());		coords[tid].assign(lvlRank, Value());
positionsBuffers[tid].assign(lvlRank, Value());		positionsBuffers[tid].assign(lvlRank, Value());
coordinatesBuffers[tid].assign(lvlRank, Value());		coordinatesBuffers[tid].assign(lvlRank, Value());
sliceOffsets[tid].assign(lvlRank, Value());		sliceOffsets[tid].assign(lvlRank, Value());
sliceStrides[tid].assign(lvlRank, Value());		sliceStrides[tid].assign(lvlRank, Value());

		// Slice-driven loops related initialization.
		levelReducedDep[tid].assign(lvlRank, 0);
dependentLvlMap[tid].assign(lvlRank,		dependentLvlMap[tid].assign(lvlRank,
std::vector<std::pair<TensorId, Level>>());		std::vector<std::pair<TensorId, Level>>());
		slicePosBuffer[tid].assign(lvlRank, std::vector<Value>());
		sliceSizes[tid].assign(lvlRank, std::vector<Value>());
		sliceStack[tid].emplace_back(/minCrd=/Value(),
		/offset=/Value(), /isNonEmpty/ Value(),
		std::nullopt, 0);
if (dimGetter) {		if (dimGetter) {
auto reassoc = collapseReassoc[tid];		auto reassoc = collapseReassoc[tid];
Level dstRank = reassoc ? reassoc.size() : lvlRank;		Level dstRank = reassoc ? reassoc.size() : lvlRank;
for (Level l = 0; l < dstRank; l++) {		for (Level l = 0; l < dstRank; l++) {
dependentLvlMap[tid][l] = dimGetter(tid, l);		dependentLvlMap[tid][l] = dimGetter(tid, l);
		unsigned depends = dependentLvlMap[tid][l].size();
		if (depends == 0)
		continue;
		wrengrUnsubmitted Done Reply Inline Actions Please keep this as `Level l` wrengr: Please keep this as `Level l`
// TODO: View-base collapse and dependent index reduction are not		// TODO: View-base collapse and dependent index reduction are not
// compatible right now.		// compatible right now.
assert(!reassoc \|\| dependentLvlMap[tid][l].empty());		assert(!reassoc);
		// We need depends - 1 slices to fully resolve the affine expression.
		aartbikUnsubmitted Done Reply Inline Actions We need `depends - 1` slices to make sure you don't read depends as part of the sentence aartbik: We need `depends - 1` slices to make sure you don't read depends as part of the sentence
		sliceSizes[tid][l].assign(depends - 1, nullptr);
		slicePosBuffer[tid][l].assign(depends - 1, nullptr);
}		}
		wrengrUnsubmitted Done Reply Inline Actions It would be clearer to combine these together. Also, it would be clearer to use `continue` rather than extra indentation for the conditional. Putting those together, maybe use something like `if (depends == 0) continue; assert(!reassoc); sliceSizes[...` wrengr: It would be clearer to combine these together. Also, it would be clearer to use `continue`…
}		}
}		}

// Construct the inverse of the `topSort` from the sparsifier.		// Construct the inverse of the `topSort` from the sparsifier.
// This is needed to map `AffineDimExpr`s back to the `LoopOrd`		// This is needed to map `AffineDimExpr`s back to the `LoopOrd`
// used in loop emitter.		// used in loop emitter.
// FIXME: This map should be maintained outside loop emitter.		// FIXME: This map should be maintained outside loop emitter.
for (LoopOrd n = 0; n < numLoops; n++)		for (LoopOrd n = 0; n < numLoops; n++)
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	if (!enc) {
// Annotated sparse tensors.		// Annotated sparse tensors.
// We also need the value buffer for all-dense annotated "sparse" tensors.		// We also need the value buffer for all-dense annotated "sparse" tensors.
valBuffer[t] = genToValues(builder, loc, tensor);		valBuffer[t] = genToValues(builder, loc, tensor);
}		}
// NOTE: we can also prepare for 0 lvl here in advance, this will hoist		// NOTE: we can also prepare for 0 lvl here in advance, this will hoist
// some loop preparation from tensor iteration, but will also (undesirably)		// some loop preparation from tensor iteration, but will also (undesirably)
// hoist the code ouside if-conditions.		// hoist the code ouside if-conditions.
}		}

		Type indexType = builder.getIndexType();
		Value c0 = constantZero(builder, loc, indexType);
		for (TensorId t = 0, e = tensors.size(); t < e; t++) {
		wrengrUnsubmitted Done Reply Inline Actions Please use `TensorId` here. I am just about to upload the CLs that make that into a newtype, so it's important to use the correct type instead of just using `size_t`/`unsigned` everywhere wrengr: Please use `TensorId` here. I am just about to upload the CLs that make that into a newtype, so…
		auto rtp = tensors[t].getType().dyn_cast<RankedTensorType>();
		wrengrUnsubmitted Done Reply Inline Actions Please define a `dyn_cast` variant of the `getSparseTensorType` function, and use that here and everywhere else. The `SparseTensorType` was created specifically to help avoid several code legibility and correctness concerns, so you should be using it everywhere possible. wrengr: Please define a `dyn_cast` variant of the `getSparseTensorType` function, and use that here and…
		if (!rtp)
		continue;

		Level lvlRank = SparseTensorType(rtp).getLvlRank();
		wrengrUnsubmitted Done Reply Inline Actions You should be using `SparseTensorType::getLevelRank` here, since it is specifically the level-rank you want not the dim-rank wrengr: You should be using `SparseTensorType::getLevelRank` here, since it is specifically the level…
		for (Level lvl = 0; lvl < lvlRank; lvl++) {
		wrengrUnsubmitted Done Reply Inline Actions Please use `Level` for all levels. Even though it's just a typedef for now, I will be converting it to a proper type in the near future, so you should use the correct type rather than just using `unsigned` everywhere wrengr: Please use `Level` for all levels. Even though it's just a typedef for now, I will be…
		if (!dependentLvlMap[t][lvl].empty()) {
		// Needs at least two operands to form a non-trivial affine expression.
		aartbikUnsubmitted Done Reply Inline Actions The comment applies to the assert, but the declaration is in between aartbik: The comment applies to the assert, but the declaration is in between
		ArrayRef<std::pair<TensorId, Level>> depLvls = dependentLvlMap[t][lvl];
		assert(depLvls.size() > 1);

		Value size = c0;
		for (unsigned e = depLvls.size() - 1; e >= 1; e--) {
		auto [dt, dd] = depLvls[e];
		size = builder.create<arith::AddIOp>(loc, size, lvlSizes[dt][dd]);
		sliceSizes[t][lvl][e - 1] = size;
		}
		}
		}
		}
		localInsertPos = builder.getInsertionPoint()->getPrevNode();
}		}

void LoopEmitter::enterNewLoopSeq(OpBuilder &builder, Location loc,		void LoopEmitter::enterNewLoopSeq(OpBuilder &builder, Location loc,
ArrayRef<TensorId> tids,		ArrayRef<TensorId> tids,
ArrayRef<Level> lvls) {		ArrayRef<Level> lvls) {
// TODO: sort		// TODO: sort
assert(loopSeqStack.size() == loopStack.size());		assert(loopSeqStack.size() == loopStack.size());
// Universal Index starts from 0.
loopSeqStack.emplace_back(constantIndex(builder, loc, 0));
// Prepares for all the tensors used in the current loop sequence.		// Prepares for all the tensors used in the current loop sequence.
assert(tids.size() == lvls.size());		std::vector<std::tuple<TensorId, Level, bool>> slicedTids;
for (auto [tid, lvl] : llvm::zip(tids, lvls))		for (auto [tid, lvl] : llvm::zip(tids, lvls)) {
		if (!dependentLvlMap[tid][lvl].empty()) {
		bool fullyRed = genSliceBegin(builder, loc, tid, lvl);
		slicedTids.emplace_back(tid, lvl, fullyRed);
		} else {
prepareLoopOverTensorAtLvl(builder, loc, tid, lvl);		prepareLoopOverTensorAtLvl(builder, loc, tid, lvl);
}		}
		}

		// Universal Index starts from 0.
		loopSeqStack.emplace_back(C_IDX(0), std::move(slicedTids));
		}

		void LoopEmitter::exitCurrentLoopSeq(OpBuilder &builder, Location loc) {
		assert(loopSeqStack.size() == loopStack.size() + 1);

		const auto &slicedTids = loopSeqStack.back().second;
		wrengrUnsubmitted Done Reply Inline Actions I think it'd be clearer to just use `const auto &` here wrengr: I think it'd be clearer to just use `const auto &` here

		// Pop out outdated slices.
		aartbikUnsubmitted Done Reply Inline Actions Please elaborate. "Pop out" is not at all representative for what follows aartbik: Please elaborate. "Pop out" is not at all representative for what follows
		for (auto [tid, lvl, res] : slicedTids) {
		if (!res) {
		assert(sliceStack[tid].back().slicedOnLvl == lvl);
		sliceStack[tid].pop_back();
		// There is an additional item in sliceStack for the input tensor.
		// assert(sliceResolvedConstraints[tid] + 1 == sliceStack[tid].size());
		} else {
		Value c1 = C_IDX(1);
		Value c2 = C_IDX(2);

		// pIdx += 2, we finished the current lvl, advance the pointer index of
		// the previous level by two to skip the [pLo, pHi] for current level.
		// TODO: we could probably use an SSA value for it.
		Value sPtrBuf = slicePosBuffer[tid][lvl].back();
		Value curP = genIndexLoad(builder, loc, sPtrBuf, c1);
		Value nexP = builder.create<arith::AddIOp>(loc, curP, c2);
		builder.create<memref::StoreOp>(loc, nexP, sPtrBuf, c1);
		}
		}
		loopSeqStack.pop_back();
		}

Value LoopEmitter::genAffine(OpBuilder &builder, Location loc, AffineExpr a) {		Value LoopEmitter::genAffine(OpBuilder &builder, Location loc, AffineExpr a) {
switch (a.getKind()) {		switch (a.getKind()) {
case AffineExprKind::DimId: {		case AffineExprKind::DimId: {
// FIXME: since the one callsite in Sparsification passes in a		// FIXME: since the one callsite in Sparsification passes in a
// level-expression, the `getPosition` must in fact be a `Dimension`.		// level-expression, the `getPosition` must in fact be a `Dimension`.
// However, elsewhere we have been lead to expect that `loopIdToOrd`		// However, elsewhere we have been lead to expect that `loopIdToOrd`
// should be indexed by `LoopId`...		// should be indexed by `LoopId`...
Show All 10 Lines	Value LoopEmitter::genAffine(OpBuilder &builder, Location loc, AffineExpr a) {
case AffineExprKind::Mul: {		case AffineExprKind::Mul: {
auto binOp = a.cast<AffineBinaryOpExpr>();		auto binOp = a.cast<AffineBinaryOpExpr>();
return builder.create<arith::MulIOp>(		return builder.create<arith::MulIOp>(
loc, genAffine(builder, loc, binOp.getLHS()),		loc, genAffine(builder, loc, binOp.getLHS()),
genAffine(builder, loc, binOp.getRHS()));		genAffine(builder, loc, binOp.getRHS()));
}		}
case AffineExprKind::Constant: {		case AffineExprKind::Constant: {
int64_t c = a.cast<AffineConstantExpr>().getValue();		int64_t c = a.cast<AffineConstantExpr>().getValue();
return constantIndex(builder, loc, c);		return C_IDX(c);
}		}
default:		default:
llvm_unreachable("unexpected affine subscript");		llvm_unreachable("unexpected affine subscript");
}		}
}		}

Operation *LoopEmitter::enterLoopOverTensorAtLvl(		Operation *LoopEmitter::emitForLoopOverTensorAtLvl(OpBuilder &builder,
OpBuilder &builder, Location loc, ArrayRef<TensorId> tids,		Location loc, TensorId tid,
ArrayRef<Level> lvls, MutableArrayRef<Value> reduc, bool isParallel) {		Level dstLvl,
// TODO: support multiple return on parallel for?		MutableArrayRef<Value> reduc,
assert(!isParallel \|\| reduc.size() <= 1);		bool isParallel) {
bool isSparseInput = false;		bool isSparseCond = isCompressedDLT(lvlTypes[tid][dstLvl]) \|\|
TensorId tid = tids.front();		isSingletonDLT(lvlTypes[tid][dstLvl]);
Level dstLvl = lvls.front();
assert(tids.size() == lvls.size());
for (auto [t, l] : llvm::zip(tids, lvls)) {
// TODO: this check for validity of the (t,l) pairs should be
// checked/enforced at the callsites, if possible.
assert(isValidLevel(t, l));
assert(!coords[t][l]); // We cannot re-enter the same level
const auto lvlTp = lvlTypes[t][l];
const bool isSparse = isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp);
// Must be a recognizable level-type.
assert(isSparse \|\| isDenseDLT(lvlTp));
// We can at most have one sparse input, otherwise, a while loop is required
// to co-iterate multiple sparse tensors.
assert(!isSparseInput \|\| !isSparse);
if (isSparse) {
tid = t;
dstLvl = l;
}
isSparseInput = isSparseInput \|\| isSparse;
}

const auto reassoc = getCollapseReassociation(tid, dstLvl);		const auto reassoc = getCollapseReassociation(tid, dstLvl);
// TODO: support dynamic slices.		// TODO: support dynamic slices.
// Use the first source-level here to build the loop bound (which is		// Uses the first dimension here to build the loop bound (which is also the
// also the biggest range).		// biggest range).
const Level srcLvl = reassoc.front();		const Level srcLvl = reassoc.front();
const Value step = constantIndex(builder, loc, 1);		Value step = C_IDX(1);
/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.		Value lo = isSparseCond ? posits[tid][srcLvl] // current offset
const Value lo = isSparseInput ? posits[tid][srcLvl] // current position		: loopSeqStack.back().first; // universal index
: loopSeqStack.back(); // universal index		Value hi = highs[tid][srcLvl];
const Value hi = highs[tid][srcLvl];

Operation *loop = nullptr;		Operation *loop = nullptr;
Value iv;		Value iv;
if (isParallel) {		if (isParallel) {
assert(collapseReassoc[tid] == nullptr);		assert(collapseReassoc[tid] == nullptr);
scf::ParallelOp parOp =		scf::ParallelOp parOp =
builder.create<scf::ParallelOp>(loc, lo, hi, step, reduc);		builder.create<scf::ParallelOp>(loc, lo, hi, step, reduc);
builder.setInsertionPointToStart(parOp.getBody());		builder.setInsertionPointToStart(parOp.getBody());
Show All 19 Lines	if (isParallel) {
assert(forOp.getNumRegionIterArgs() == reduc.size());		assert(forOp.getNumRegionIterArgs() == reduc.size());
for (int i = 0, e = reduc.size(); i < e; i++)		for (int i = 0, e = reduc.size(); i < e; i++)
reduc[i] = forOp.getRegionIterArg(i);		reduc[i] = forOp.getRegionIterArg(i);
loop = forOp;		loop = forOp;
}		}
assert(loop && iv);		assert(loop && iv);

Value crd;		Value crd;
if (isSparseInput) {		if (isSparseCond) {
assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));		assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));
// For COO, the position is the same across consecutive levels.		// For COO, the position is the same across consecutive levels.
/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.		/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.
llvm::for_each(reassoc,		llvm::for_each(reassoc,
[this, tid, iv](Level srcLvl) { posits[tid][srcLvl] = iv; });		[this, tid, iv](Level srcLvl) { posits[tid][srcLvl] = iv; });
crd = genSparseCrd(builder, loc, tid, dstLvl);		crd = genSparseCrd(builder, loc, tid, dstLvl);
} else {		} else {
// Dense tensor, the coordinate is the inducation variable.		// Dense tensor, the coordinate is the inducation variable.
crd = iv;		crd = iv;
}		}

if (isSparseSlices[tid] && isSparseInput) {		if (isSparseSlices[tid] && isSparseCond) {
// For sparse level slices, we need to filter out invalid coordinates that		// For sparse level slices, we need to filter out invalid coordinates that
// are not included in the slice.		// are not included in the slice.
SmallVector<Type> types;		SmallVector<Type> types;
for (Value red : reduc)		for (Value red : reduc)
types.push_back(red.getType());		types.push_back(red.getType());

auto [trans, pred] = genSliceLegitPredicate(builder, loc, crd, tid, srcLvl);		auto [trans, pred] = genSliceLegitPredicate(builder, loc, crd, tid, srcLvl);
bool hasReduc = !types.empty();		bool hasReduc = !types.empty();
Show All 12 Lines	if (hasReduc) {
builder.create<scf::YieldOp>(loc, reduc);		builder.create<scf::YieldOp>(loc, reduc);
}		}
// Set the insertion point to matched branch.		// Set the insertion point to matched branch.
builder.setInsertionPointToStart(&ifOp.getThenRegion().front());		builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
crd = trans;		crd = trans;
}		}

assert(crd);		assert(crd);
coords[tid][srcLvl] = crd;		coords[tid][dstLvl] = crd;
// NOTE: we can also prepare for next level here in advance		return loop;
// Push the loop into stack		}
loopStack.emplace_back(ArrayRef<TensorId>(tid), ArrayRef<Level>(srcLvl), loop,
builder.getInsertionBlock(), crd, loopTag);		Operation *LoopEmitter::enterLoopOverTensorAtLvl(
		OpBuilder &builder, Location loc, ArrayRef<TensorId> tids,
		ArrayRef<Level> lvls, MutableArrayRef<Value> reduc, bool isParallel) {
		// TODO: support multiple return on parallel for?
		assert(!isParallel \|\| reduc.size() <= 1);
		bool isSparseCond = false, isSliceCond = false;
		size_t tid = tids.front(), lvl = lvls.front();

		for (auto [t, l] : llvm::zip(tids, lvls)) {
		assert(lvlTypes[t].size() > l); // Must be a valid tid, dim pair
		wrengrUnsubmitted Done Reply Inline Actions Use "l" or "lvl" here. The name "d" is reserved for things of `Dimension` type, whereas this has `Level` type. wrengr: Use "l" or "lvl" here. The name "d" is reserved for things of `Dimension` type, whereas this…
		assert(!coords[t][l] \|\| // We cannot re-enter the same level
		!dependentLvlMap[t][l].empty()); // unless it is a slice-driver loop
		auto dimType = lvlTypes[t][l];
		// Must be a recognizable DLT.
		assert(isDenseDLT(dimType) \|\| isCompressedDLT(dimType) \|\|
		isSingletonDLT(dimType));

		// This is a slice-driven loop.
		if (!dependentLvlMap[t][l].empty()) {
		assert(!isSliceCond && !isSparseCond);
		isSliceCond = true;
		tid = t;
		lvl = l;
		continue;
		}

		bool isSparse = isCompressedDLT(dimType) \|\| isSingletonDLT(dimType);
		// We can at most have one sparse input, otherwise, a while loop is
		// required to co-iterate multiple sparse tensors.
		assert(!isSparseCond \|\| !isSparse);
		assert(!isSliceCond \|\| !isSparseCond);
		if (isSparse) {
		tid = t;
		lvl = l;
		}
		isSparseCond = isSparseCond \|\| isSparse;
		}

		// if the slice is fully reduced, we can now use TACO-based algorithm to
		aartbikUnsubmitted Done Reply Inline Actions If the slice aartbik: If the slice
		// iterate it.
		Operation *l = nullptr;
		aartbikUnsubmitted Done Reply Inline Actions I find this block of code extremely hard to read. Any way to factor this into slightly smaller methods and combine these? aartbik: I find this block of code extremely hard to read. Any way to factor this into slightly smaller…
		PeimingAuthorUnsubmitted Done Reply Inline Actions better now? Peiming: better now?
		aartbikUnsubmitted Done Reply Inline Actions Yes, although it could still use a bit more doc on what each block does (on entry of each block). Also, I would not overuse the "NOTE" part, in principle, all comments are NOTEs and we should only use them when something really should jump out aartbik: Yes, although it could still use a bit more doc on what each block does (on entry of each…
		if (isSliceCond) {
		bool fullyReduced = depFullyReduced(tid, lvl);
		if (!fullyReduced) {
		l = emitSliceDrivenLoopOverTensorAtLvl(builder, loc, tid, lvl, reduc);
		} else {
		const SliceInfo &info = getFinalSliceOnLvl(tid, lvl);
		Value offset = info.offset;
		unsigned depth = info.depth - 1;
		Operation *insertPoint = nullptr;
		// TODO: we should generalize the method to support iteration over for
		// normal slices as well to allow early break.
		l = genSliceLvlTraverseLoop(
		builder, loc, posits[tid][lvl], highs[tid][lvl], offset, tid, lvl,
		depth, reduc,
		/genYield=/false, // unaware of the yield values from user yet
		[this, tid, lvl, reduc, offset,
		&insertPoint](OpBuilder &builder, Location loc, Value iv,
		MutableArrayRef<Value> innerReduc) {
		assert(innerReduc.size() == reduc.size());
		// Updates users' reduction variable inplace
		for (unsigned i = 0, e = reduc.size(); i < e; i++)
		reduc[i] = innerReduc[i];
		// Loads the coordinates.
		Value absC = genIndexLoad(builder, loc,
		coordinatesBuffers[tid][lvl], iv);

		// We need to substract the offset to get relative coordinates.
		// TODO: how to assert relC >=0 during runtime?
		insertPoint = builder.create<arith::SubIOp>(loc, absC, offset);
		posits[tid][lvl] = iv;
		coords[tid][lvl] = insertPoint->getResult(0);
		})
		.first;
		// We did not finish the loop body, reset the insertion point and delegate
		// to user.
		builder.setInsertionPointAfter(insertPoint);
		}
		levelReducedDep[tid][lvl]++;
		// NOTE: we can also prepare for next dim here in advance
		// Pushes the loop into stack.
		loopStack.emplace_back(
		ArrayRef<TensorId>(), ArrayRef<Level>(), ArrayRef<TensorId>(tid),
		ArrayRef<Level>(lvl), ArrayRef<bool>(fullyReduced), l,
		builder.getInsertionBlock(), coords[tid][lvl], loopTag);
		} else {
		l = emitForLoopOverTensorAtLvl(builder, loc, tid, lvl, reduc, isParallel);
		// NOTE: we can also prepare for next dim here in advance
		// Pushes the loop into stack.
		loopStack.emplace_back(ArrayRef<TensorId>(tid), ArrayRef<Level>(lvl),
		ArrayRef<TensorId>(), ArrayRef<Level>(),
		ArrayRef<bool>(), l, builder.getInsertionBlock(),
		coords[tid][lvl], loopTag);
		}

// Emit extra locals.		// Emit extra locals.
emitExtraLocalsForTensorsAtDenseLvls(builder, loc, tids, lvls);		emitExtraLocalsForTensorsAtDenseLvls(builder, loc, tids, lvls);
		return l;
return loop;
}		}

Operation *LoopEmitter::enterFilterLoopOverTensorAtLvl(		Operation *LoopEmitter::enterFilterLoopOverTensorAtLvl(
OpBuilder &builder, Location loc, TensorId tid, Level lvl,		OpBuilder &builder, Location loc, TensorId tid, Level lvl,
AffineExpr affine, MutableArrayRef<Value> reduc) {		AffineExpr affine, MutableArrayRef<Value> reduc) {
assert(isValidLevel(tid, lvl));		assert(isValidLevel(tid, lvl));
assert(!affine.isa<AffineDimExpr>() && !isDenseDLT(lvlTypes[tid][lvl]));		assert(!affine.isa<AffineDimExpr>() && !isDenseDLT(lvlTypes[tid][lvl]));
// We can not re-enter the same level.		// We can not re-enter the same level.
assert(!coords[tid][lvl]);		assert(!coords[tid][lvl]);

// TODO: We should instead use a whileOp for filter loop to allow early		// TODO: We should instead use a whileOp for filter loop to allow early
// break when exceeding (for ordered levels).		// break when exceeding (for ordered levels).
// TODO: There are many other potiential opportunities that we might apply in		// TODO: There are many other potiential opportunities that we might apply in
// the future. E.g., we could use binary search to locate positions.		// the future. E.g., we could use binary search to locate positions.
const Value step = constantIndex(builder, loc, 1);		const Value step = C_IDX(1);
const Value pLo = posits[tid][lvl];		const Value pLo = posits[tid][lvl];
const Value pHi = highs[tid][lvl];		const Value pHi = highs[tid][lvl];
scf::ForOp forOp = builder.create<scf::ForOp>(loc, pLo, pHi, step, reduc);		scf::ForOp forOp = builder.create<scf::ForOp>(loc, pLo, pHi, step, reduc);

// In-place update on the reduction variable vector.		// In-place update on the reduction variable vector.
assert(forOp.getNumRegionIterArgs() == reduc.size());		assert(forOp.getNumRegionIterArgs() == reduc.size());
for (int i = 0, e = reduc.size(); i < e; i++)		for (int i = 0, e = reduc.size(); i < e; i++)
reduc[i] = forOp.getRegionIterArg(i);		reduc[i] = forOp.getRegionIterArg(i);

builder.setInsertionPointToStart(forOp.getBody());		builder.setInsertionPointToStart(forOp.getBody());
// The induction variable gives the position.		// The induction variable gives the position.
const Value pos = forOp.getInductionVar();		const Value pos = forOp.getInductionVar();
posits[tid][lvl] = pos;		posits[tid][lvl] = pos;
		aartbikUnsubmitted Done Reply Inline Actions isn't that always the case here? Should that not be part of the method description then? aartbik: isn't that always the case here? Should that not be part of the method description then?
// Generating a load on the coordinates array yields the crd.		// Generating a load on the coordinates array yields the crd.
const Value mem = coordinatesBuffers[tid][lvl];		const Value mem = coordinatesBuffers[tid][lvl];
const Value crd = genIndexLoad(builder, loc, mem, pos);		const Value crd = genIndexLoad(builder, loc, mem, pos);
coords[tid][lvl] = crd;		coords[tid][lvl] = crd;

// Generate an if-condition to filter out coordinates that are not		// Generate an if-condition to filter out coordinates that are not
// equal to the result of the affine expression.		// equal to the result of the affine expression.
Value expected = genAffine(builder, loc, affine);		Value expected = genAffine(builder, loc, affine);
auto pred = builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::eq, crd,		auto pred = CMPI(eq, coords[tid][lvl], expected);
expected);
SmallVector<Type> types;		SmallVector<Type> types;
for (Value red : reduc) {		for (Value red : reduc) {
types.push_back(red.getType());		types.push_back(red.getType());
}		}

bool hasReduc = !types.empty();		bool hasReduc = !types.empty();
scf::IfOp ifOp =		scf::IfOp ifOp =
builder.create<scf::IfOp>(loc, types, pred, /else/ hasReduc);		builder.create<scf::IfOp>(loc, types, pred, /else/ hasReduc);
Show All 9 Lines	if (hasReduc) {
// On mismatch.		// On mismatch.
builder.create<scf::YieldOp>(loc, reduc);		builder.create<scf::YieldOp>(loc, reduc);
}		}
// Set the insert point to matched branch.		// Set the insert point to matched branch.
builder.setInsertionPointToStart(&ifOp.getThenRegion().front());		builder.setInsertionPointToStart(&ifOp.getThenRegion().front());

// NOTE: we can also prepare for next lvl here in advance		// NOTE: we can also prepare for next lvl here in advance
// Push the loop into stack		// Push the loop into stack
loopStack.emplace_back(ArrayRef<TensorId>(tid), ArrayRef<Level>(lvl), forOp,		loopStack.emplace_back(ArrayRef<TensorId>(tid), ArrayRef<Level>(lvl),
builder.getInsertionBlock(), crd, nullptr);		ArrayRef<TensorId>(), ArrayRef<Level>(),
		ArrayRef<bool>(), forOp, builder.getInsertionBlock(),
		coords[tid][lvl], nullptr);
return forOp;		return forOp;
}		}

void LoopEmitter::genDenseAffineAddress(OpBuilder &builder, Location loc,		void LoopEmitter::genDenseAffineAddress(OpBuilder &builder, Location loc,
TensorId tid, Level lvl,		TensorId tid, Level lvl,
AffineExpr lvlExpr) {		AffineExpr lvlExpr) {
assert(isDenseDLT(lvlTypes[tid][lvl]));		assert(isDenseDLT(lvlTypes[tid][lvl]));
// For dense levels, the level-coordinate also serves as the position.		// For dense levels, the level-coordinate also serves as the position.
Value lvlCrd = genAffine(builder, loc, lvlExpr);		Value lvlCrd = genAffine(builder, loc, lvlExpr);
posits[tid][lvl] = genAddress(builder, loc, tid, lvl, lvlCrd);		posits[tid][lvl] = genAddress(builder, loc, tid, lvl, lvlCrd);
}		}

Operation *LoopEmitter::enterCoIterationOverTensorsAtLvls(		Operation *LoopEmitter::enterCoIterationOverTensorsAtLvls(
OpBuilder &builder, Location loc, ArrayRef<TensorId> tids,		OpBuilder &builder, Location loc, ArrayRef<TensorId> tids,
ArrayRef<Level> lvls, bool needsUniv, MutableArrayRef<Value> reduc) {		ArrayRef<Level> lvls, bool needsUniv, MutableArrayRef<Value> reduc) {
		// NOTE: make sure that the slice driven tensor-related reduction variable
		aartbikUnsubmitted Done Reply Inline Actions A note "make sure" is very ambiguous. Is that a note to self, or something that the code actively does. Much better is to use an affirmative statement aartbik: A note "make sure" is very ambiguous. Is that a note to self, or something that the code…
		// appears first than normal tensors.
		aartbikUnsubmitted Done Reply Inline Actions appears first than normal tensors appears before normal tensors? aartbik: appears first than normal tensors appears before normal tensors?
assert(tids.size() == lvls.size());		assert(tids.size() == lvls.size());
SmallVector<Type> types;		SmallVector<Type> types;
SmallVector<Value> operands;		SmallVector<Value> operands;
// Construct the while-loop with a parameter for each coordinate.		// Construct the while-loop with a parameter for each coordinate.
const Type indexType = builder.getIndexType();		const Type indexType = builder.getIndexType();
		wrengrUnsubmitted Done Reply Inline Actions Why remove the const? It's clearer to know when local variables will never change wrengr: Why remove the const? It's clearer to know when local variables will never change
		PeimingAuthorUnsubmitted Done Reply Inline Actions rebase mistake. Peiming: rebase mistake.
for (auto [tid, lvl] : llvm::zip(tids, lvls)) {		for (auto [tid, lvl] : llvm::zip(tids, lvls)) {
		// TODO: support coiteration with slice driven tensors.
const auto lvlTp = lvlTypes[tid][lvl];		const auto lvlTp = lvlTypes[tid][lvl];
		assert(dependentLvlMap[tid][lvl].empty() && "TODO: not yet implemented");
if (isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp)) {		if (isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp)) {
const auto reassoc = getCollapseReassociation(tid, lvl);		const auto reassoc = getCollapseReassociation(tid, lvl);
for (unsigned i = 0, e = reassoc.size() - 1; i < e; i++) {		for (unsigned i = 0, e = reassoc.size() - 1; i < e; i++) {
		wrengrUnsubmitted Done Reply Inline Actions Please don't undo my factoring this out into a local variable. The condition is much easier to read when (1) it's all on one line, and (2) avoids repeating common expressions which forces the reader to double check if they are indeed the same or not. wrengr: Please don't undo my factoring this out into a local variable. The condition is much easier to…
		PeimingAuthorUnsubmitted Done Reply Inline Actions Okay, it is a mistake I made during rebasing. Peiming: Okay, it is a mistake I made during rebasing.
if (!isUniqueDLT(lvlTypes[tid][reassoc[i]])) {		if (!isUniqueDLT(lvlTypes[tid][reassoc[i]])) {
// This is the segment high for each non-unique levels.		// This is the segment high for each non-unique levels.
types.push_back(indexType);		types.push_back(indexType);
operands.push_back(constantIndex(builder, loc, 0));		operands.push_back(C_IDX(0));
}		}
}		}
const auto pos = posits[tid][reassoc.front()];		const auto pos = posits[tid][reassoc.front()];
assert(pos);		assert(pos);
types.push_back(indexType);		types.push_back(indexType);
operands.push_back(pos);		operands.push_back(pos);
}		}
}		}
// The position where user-supplied reduction variable starts.		// The position where user-supplied reduction variable starts.
for (Value rec : reduc) {		for (Value rec : reduc) {
types.push_back(rec.getType());		types.push_back(rec.getType());
operands.push_back(rec);		operands.push_back(rec);
}		}
if (needsUniv) {		if (needsUniv) {
types.push_back(indexType);		types.push_back(indexType);
// Update universal index.		// Update universal index.
operands.push_back(loopSeqStack.back());		operands.push_back(loopSeqStack.back().first);
}		}
assert(types.size() == operands.size());		assert(types.size() == operands.size());
scf::WhileOp whileOp = builder.create<scf::WhileOp>(loc, types, operands);		scf::WhileOp whileOp = builder.create<scf::WhileOp>(loc, types, operands);

SmallVector<Location> locs(types.size(), loc);		SmallVector<Location> locs(types.size(), loc);
Block *before = builder.createBlock(&whileOp.getBefore(), {}, types, locs);		Block *before = builder.createBlock(&whileOp.getBefore(), {}, types, locs);
Block *after = builder.createBlock(&whileOp.getAfter(), {}, types, locs);		Block *after = builder.createBlock(&whileOp.getAfter(), {}, types, locs);

Show All 12 Lines	if (isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp)) {
if (!isUniqueDLT(lvlTypes[tid][reassoc[i]])) {		if (!isUniqueDLT(lvlTypes[tid][reassoc[i]])) {
// Links the SSA chain for segHi.		// Links the SSA chain for segHi.
segHi[tid][reassoc[i]] = after->getArgument(o++);		segHi[tid][reassoc[i]] = after->getArgument(o++);
}		}
}		}
Value op1 = before->getArgument(o);		Value op1 = before->getArgument(o);
// We used the first level bound as the bound the collapsed set of levels.		// We used the first level bound as the bound the collapsed set of levels.
Value op2 = highs[tid][reassoc.front()];		Value op2 = highs[tid][reassoc.front()];
Value opc = builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::ult,		Value opc = CMPI(ult, op1, op2);
op1, op2);
cond = cond ? builder.create<arith::AndIOp>(loc, cond, opc) : opc;		cond = cond ? builder.create<arith::AndIOp>(loc, cond, opc) : opc;
// Update positions		// Update positions
Value pos = after->getArgument(o++);		Value pos = after->getArgument(o++);
// For COO, the position is the same across consecutive levels.		// For COO, the position is the same across consecutive levels.
/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.		/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.
llvm::for_each(reassoc, [this, tid, pos](Level srcLvl) {		llvm::for_each(reassoc, [this, tid, pos](Level srcLvl) {
posits[tid][srcLvl] = pos;		posits[tid][srcLvl] = pos;
});		});
Show All 27 Lines	if (!slicesPreds.empty()) {
SmallVector<Value> yields(after->getArguments());		SmallVector<Value> yields(after->getArguments());
// Generates a list of if statments		// Generates a list of if statments
// pos = in_slice ? pos : pos + 1		// pos = in_slice ? pos : pos + 1
// TODO: instead of always picking pos + 1, we should set pos = high to		// TODO: instead of always picking pos + 1, we should set pos = high to
// break to loop if the coordinates are larger than the slice size.		// break to loop if the coordinates are larger than the slice size.
//		//
// This "idx" is the index into `llvm::zip(tids, lvls)`		// This "idx" is the index into `llvm::zip(tids, lvls)`
for (auto [pred, idx] : slicesPreds) {		for (auto [pred, idx] : slicesPreds) {
Value nextPos = builder.create<arith::AddIOp>(		Value nextPos = builder.create<arith::AddIOp>(loc, yields[idx], C_IDX(1));
loc, yields[idx], constantIndex(builder, loc, 1));
yields[idx] =		yields[idx] =
builder.create<arith::SelectOp>(loc, pred, yields[idx], nextPos);		builder.create<arith::SelectOp>(loc, pred, yields[idx], nextPos);
}		}

Value pred = slicesPreds.front().first;		Value pred = slicesPreds.front().first;
for (int i = 1, e = slicesPreds.size(); i < e; i++) {		for (int i = 1, e = slicesPreds.size(); i < e; i++) {
pred = builder.create<arith::AndIOp>(loc, pred, slicesPreds[i].first);		pred = builder.create<arith::AndIOp>(loc, pred, slicesPreds[i].first);
}		}
Show All 13 Lines	Operation *LoopEmitter::enterCoIterationOverTensorsAtLvls(
Value min;		Value min;
// Finds the minimum coordinate		// Finds the minimum coordinate
if (!needsUniv) {		if (!needsUniv) {
for (auto [tid, lvl] : llvm::zip(tids, lvls)) {		for (auto [tid, lvl] : llvm::zip(tids, lvls)) {
const auto lvlTp = lvlTypes[tid][lvl];		const auto lvlTp = lvlTypes[tid][lvl];
if (isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp)) {		if (isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp)) {
const auto crd = coords[tid][lvl];		const auto crd = coords[tid][lvl];
if (min) {		if (min) {
Value cmp = builder.create<arith::CmpIOp>(		Value cmp = CMPI(ult, coords[tid][lvl], min);
loc, arith::CmpIPredicate::ult, crd, min);		min =
min = builder.create<arith::SelectOp>(loc, cmp, crd, min);		builder.create<arith::SelectOp>(loc, cmp, coords[tid][lvl], min);
} else {		} else {
min = crd;		min = crd;
}		}
}		}
}		}
} else {		} else {
assert(!min);		assert(!min);
// Otherwise, universal index is the minimal pos.		// Otherwise, universal index is the minimal pos.
min = after->getArguments().back();		min = after->getArguments().back();
}		}

// Sets up the loop stack.		// Sets up the loop stack.
loopStack.emplace_back(tids, lvls, whileOp, builder.getInsertionBlock(), min,		loopStack.emplace_back(tids, lvls, ArrayRef<TensorId>(), ArrayRef<Level>(),
loopTag);		ArrayRef<bool>(), whileOp, builder.getInsertionBlock(),
		min, loopTag);
assert(loopStack.size() == loopSeqStack.size());		assert(loopStack.size() == loopSeqStack.size());

for (auto [tid, dstLvl] : llvm::zip(tids, lvls)) {		for (auto [tid, dstLvl] : llvm::zip(tids, lvls)) {
const auto reassoc = getCollapseReassociation(tid, dstLvl);		const auto reassoc = getCollapseReassociation(tid, dstLvl);
assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));		assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));
// TODO: Refactors this into smaller functions.		// TODO: Refactors this into smaller functions.
// NOTE: For all the collapsed level (except for the last one, that is why		// NOTE: For all the collapsed level (except for the last one, that is why
// the loop ends with `reassoc.size() - 1`), as each iteration is advanced		// the loop ends with `reassoc.size() - 1`), as each iteration is advanced
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
void LoopEmitter::prepareLoopOverTensorAtLvl(OpBuilder &builder, Location loc,		void LoopEmitter::prepareLoopOverTensorAtLvl(OpBuilder &builder, Location loc,
TensorId tid, Level dstLvl) {		TensorId tid, Level dstLvl) {
assert(isValidLevel(tid, dstLvl));		assert(isValidLevel(tid, dstLvl));
const auto lvlTp = lvlTypes[tid][dstLvl];		const auto lvlTp = lvlTypes[tid][dstLvl];

if (isDenseDLT(lvlTp))		if (isDenseDLT(lvlTp))
return;		return;

const Value c0 = constantIndex(builder, loc, 0);		const Value c0 = C_IDX(0);
const Value c1 = constantIndex(builder, loc, 1);		const Value c1 = C_IDX(1);
for (const Level srcLvl : getCollapseReassociation(tid, dstLvl)) {		for (const Level srcLvl : getCollapseReassociation(tid, dstLvl)) {
// Either the first level, or the previous level has been set.		// Either the first level, or the previous level has been set.
/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.		/// FIXME: See the [CLARIFY_POSITS_LVL] note in the header.
assert(srcLvl == 0 \|\| posits[tid][srcLvl - 1]);		assert(srcLvl == 0 \|\| posits[tid][srcLvl - 1]);
if (!isCompressedDLT(lvlTp) && !isSingletonDLT(lvlTp))		if (!isCompressedDLT(lvlTp) && !isSingletonDLT(lvlTp))
continue;		continue;
if (isCompressedDLT(lvlTp)) {		if (isCompressedDLT(lvlTp)) {
const Value mem = positionsBuffers[tid][srcLvl];		const Value mem = positionsBuffers[tid][srcLvl];
▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
}		}

void LoopEmitter::exitWhileLoop(OpBuilder &builder, Location loc,		void LoopEmitter::exitWhileLoop(OpBuilder &builder, Location loc,
MutableArrayRef<Value> reduc) {		MutableArrayRef<Value> reduc) {
const LoopInfo &loopInfo = loopStack.back();		const LoopInfo &loopInfo = loopStack.back();
auto whileOp = llvm::cast<scf::WhileOp>(loopInfo.loop);		auto whileOp = llvm::cast<scf::WhileOp>(loopInfo.loop);
builder.setInsertionPointToEnd(loopInfo.userCodeBlock);		builder.setInsertionPointToEnd(loopInfo.userCodeBlock);
Value iv = loopInfo.iv;		Value iv = loopInfo.iv;

// Finalize the induction. Note that the induction could be performed		// Finalize the induction. Note that the induction could be performed
// in the individual if-branches to avoid re-evaluating the conditions.		// in the individual if-branches to avoid re-evaluating the conditions.
// However, that would result in a rather elaborate forest of yield		// However, that would result in a rather elaborate forest of yield
// instructions during code generation. Moreover, performing the induction		// instructions during code generation. Moreover, performing the induction
// after the if-statements more closely resembles code generated by TACO.		// after the if-statements more closely resembles code generated by TACO.
unsigned o = 0;		unsigned o = 0;
SmallVector<Value> operands;		SmallVector<Value> operands;
Value one = constantIndex(builder, loc, 1);		unsigned delta = 0;
		for (auto [tid, lvl, resolved] : llvm::zip(
		loopInfo.slicedTids, loopInfo.slicedLvls, loopInfo.sliceReduced)) {
		levelReducedDep[tid][lvl]--;
		if (!resolved) {
		genSliceNextInduction(builder, loc, whileOp, tid, lvl, operands, o);
		} else {
		wrengrUnsubmitted Done Reply Inline Actions It would be clearer to use `break` in the then-branch. That keeps you from needing to indent the else-branch (which is very long), and helps the reader avoid needing to check to see if there's something else after the else-branch. wrengr: It would be clearer to use `break` in the then-branch. That keeps you from needing to indent…
		PeimingAuthorUnsubmitted Done Reply Inline Actions I found `else` is easier to follow, because the control flow is more straight forward. Peiming: I found `else` is easier to follow, because the control flow is more straight forward.
		aartbikUnsubmitted Done Reply Inline Actions I agree the else is very long and deep Why not if (!resolved) { genSlice continue; } .... aartbik: I agree the else is very long and deep Why not if (!resolved) { genSlice continue; } ....
		// TODO: We need to distinguish coiterate loop with slice-driven loop and
		// fully reduced while op for iterating one slices.
		// FIXME: since we didn't implement coiteration, this must be iteration
		// just on fully resolved slice.
		assert(loopInfo.slicedTids.size() == 1 && loopInfo.tids.empty());
		// The if guard to filter out out-range coordinates.
		assert(llvm::isa<scf::IfOp>(builder.getInsertionBlock()->getParentOp()));
		posits[tid][lvl] = whileOp->getResult(o++);
		// FIXME: we are not using continue here since we do not support
		// coiteration on slices. But it need to be treated similarly as the
		// universal index.
		o++; // skip continue flag.
		// Since we did not push two results from whileOp. The size of the
		// operands vector is smaller than the actual number of return values from
		// the whileOp.
		// It is because we are actually generate yield in the IfOp inside the
		// whileOp to only iterates over inbound coordinates within the slices.
		delta += 2;
		}
		};

		Value one = C_IDX(1);
for (auto [tid, dstLvl] : llvm::zip(loopInfo.tids, loopInfo.lvls)) {		for (auto [tid, dstLvl] : llvm::zip(loopInfo.tids, loopInfo.lvls)) {
const auto lvlTp = lvlTypes[tid][dstLvl];		const auto lvlTp = lvlTypes[tid][dstLvl];
if (isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp)) {		if (isCompressedDLT(lvlTp) \|\| isSingletonDLT(lvlTp)) {
const auto reassoc = getCollapseReassociation(tid, dstLvl);		const auto reassoc = getCollapseReassociation(tid, dstLvl);
assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));		assert(reassoc.size() == 1 \|\| isUniqueCOOType(tensors[tid].getType()));
for (unsigned i = 0, e = reassoc.size() - 1; i < e; i++) {		for (unsigned i = 0, e = reassoc.size() - 1; i < e; i++) {
const Level srcLvl = reassoc[i];		const Level srcLvl = reassoc[i];
if (!isUniqueDLT(lvlTypes[tid][srcLvl])) {		if (!isUniqueDLT(lvlTypes[tid][srcLvl])) {
operands.push_back(segHi[tid][srcLvl]);		operands.push_back(segHi[tid][srcLvl]);
o++;		o++;
}		}
}		}
const Value crd = coords[tid][dstLvl];		const Value crd = coords[tid][dstLvl];
const Value pos = posits[tid][dstLvl];		const Value pos = posits[tid][dstLvl];
Value cmp =		Value cmp = CMPI(eq, crd, iv);
builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::eq, crd, iv);
// If the loop contains a coiteration with non-unique level, we fast		// If the loop contains a coiteration with non-unique level, we fast
// forward all the duplicated coords by setting the position to the		// forward all the duplicated coords by setting the position to the
// segment high.		// segment high.
Value add = !isUniqueDLT(lvlTypes[tid][reassoc.back()])		Value add = !isUniqueDLT(lvlTypes[tid][reassoc.back()])
? segHi[tid][reassoc.back()]		? segHi[tid][reassoc.back()]
: builder.create<arith::AddIOp>(loc, pos, one);		: builder.create<arith::AddIOp>(loc, pos, one);

operands.push_back(builder.create<arith::SelectOp>(loc, cmp, add, pos));		operands.push_back(builder.create<arith::SelectOp>(loc, cmp, add, pos));
Show All 18 Lines	void LoopEmitter::exitWhileLoop(OpBuilder &builder, Location loc,
// Reduction value from users.		// Reduction value from users.
for (auto &i : reduc) {		for (auto &i : reduc) {
operands.push_back(i);		operands.push_back(i);
// In place update reduction variable.		// In place update reduction variable.
i = whileOp->getResult(o++);		i = whileOp->getResult(o++);
}		}

// An (optional) universal index.		// An (optional) universal index.
if (operands.size() < whileOp.getNumResults()) {		if (operands.size() + delta < whileOp.getNumResults()) {
assert(operands.size() + 1 == whileOp.getNumResults());		assert(operands.size() + delta + 1 == whileOp.getNumResults());
// The last one is the universial index.		// The last one is the universial index.
operands.push_back(builder.create<arith::AddIOp>(loc, iv, one));		operands.push_back(builder.create<arith::AddIOp>(loc, iv, one));
// update the loop starting point of current loop sequence		// update the loop starting point of current loop sequence
loopSeqStack.back() = whileOp->getResult(o++);		loopSeqStack.back().first = whileOp->getResult(o++);
}		}

assert(o == operands.size());		assert(o == operands.size() + delta);
builder.create<scf::YieldOp>(loc, operands);		builder.create<scf::YieldOp>(loc, operands);
builder.setInsertionPointAfter(whileOp);		builder.setInsertionPointAfter(whileOp);
}		}

void LoopEmitter::exitCurrentLoop(RewriterBase &rewriter, Location loc,		void LoopEmitter::exitCurrentLoop(RewriterBase &rewriter, Location loc,
MutableArrayRef<Value> reduc) {		MutableArrayRef<Value> reduc) {
// Clean up the values, it would help use to discover potential bug at a		// Clean up the values, it would help use to discover potential bug at a
// earlier stage (instead of silently using a wrong value).		// earlier stage (instead of silently using a wrong value).
const LoopInfo &loopInfo = loopStack.back();		const LoopInfo &loopInfo = loopStack.back();
assert(loopInfo.tids.size() == loopInfo.lvls.size());		assert(loopInfo.tids.size() == loopInfo.lvls.size());
SmallVector<Value> red;		SmallVector<Value> red;
if (llvm::isa<scf::WhileOp>(loopInfo.loop)) {		if (llvm::isa<scf::WhileOp>(loopInfo.loop)) {
exitWhileLoop(rewriter, loc, reduc);		exitWhileLoop(rewriter, loc, reduc);
} else {		} else {
exitForLoop(rewriter, loc, reduc);		exitForLoop(rewriter, loc, reduc);
}		}

assert(loopStack.size() == loopSeqStack.size());		assert(loopStack.size() == loopSeqStack.size());
loopStack.pop_back();		loopStack.pop_back();
}		}

		//===----------------------------------------------------------------------===//
		// Slice-driven loop related methods.
		aartbikUnsubmitted Done Reply Inline Actions Ok, this block is where all the magic happens ;-) I need to do one more careful pass over this... aartbik: Ok, this block is where all the magic happens ;-) I need to do one more careful pass over this..
		//===----------------------------------------------------------------------===//

		LoopEmitter::SliceInfo &LoopEmitter::getFinalSliceOnLvl(TensorId tid,
		aartbikUnsubmitted Done Reply Inline Actions I think this still needs some work to make reading the block easier. The problem is that you have very concise comments in the header (Generates .....), which is okay, since i don't want to see more there, but very few comments here, where it matters. So I would still give every implementation function here an entry comment, but one that shows what is generated, using some pseudo-code of the output That way, on entry of each method, I know what to expect, and dive into the various blocks with more pre-knowledge on what they do WDYT? aartbik: I think this still needs some work to make reading the block easier. The problem is that you…
		Level lvl) {
		for (auto it = sliceStack[tid].rbegin(), ie = sliceStack[tid].rend(); it < ie;
		it++) {
		if (it->slicedOnLvl == lvl) {
		assert(it->depth == dependentLvlMap[tid][lvl].size() - 1);
		return *it;
		}
		}

		llvm_unreachable("Failed to find sliceInfo");
		}

		size_t LoopEmitter::remDepOnLevel(TensorId tid, Level lvl) const {
		aartbikUnsubmitted Done Reply Inline Actions this one seems out of place (all others generate stuff) perhaps move it up or down in the method order (also in header) aartbik: this one seems out of place (all others generate stuff) perhaps move it up or down in the…
		size_t totalDependencies = dependentLvlMap[tid][lvl].size();
		if (totalDependencies != 0) {
		assert(totalDependencies >= 2);
		return totalDependencies - levelReducedDep[tid][lvl];
		}
		return totalDependencies;
		}

		std::pair<Operation *, ValueRange> LoopEmitter::genSliceLvlTraverseLoop(
		OpBuilder &builder, Location loc, Value loopLo, Value loopHi, Value offset,
		TensorId tid, Level lvl, size_t depth, ValueRange userReduc, bool genYield,
		llvm::function_ref<void(OpBuilder &, Location, Value,
		MutableArrayRef<Value>)>
		bodyBuilder) {
		Value c1 = C_IDX(1);
		Value sliceHi =
		builder.create<arith::AddIOp>(loc, offset, sliceSizes[tid][lvl].back());

		SmallVector<Value> reduc = {
		loopLo, // loop lower bounds
		constantI1(builder, loc, true), // continue
		};
		// Append user required reduction value.
		reduc.append(userReduc.begin(), userReduc.end());
		scf::WhileOp whileOp = builder.create<scf::WhileOp>(
		loc, ValueRange(reduc).getTypes(), reduc,
		/beforeBuilder=/
		[loopHi](OpBuilder &builder, Location loc, ValueRange args) {
		Value lo = args[0];
		Value cont = args[1];
		Value inBound = CMPI(ult, lo, loopHi);
		Value cond = builder.create<arith::AndIOp>(loc, cont, inBound);
		// continue if not yet break nor out of bound.
		builder.create<scf::ConditionOp>(loc, cond, args);
		},
		/afterBuilder=/
		[this, c1, tid, lvl, sliceHi, genYield,
		bodyBuilder](OpBuilder &builder, Location loc, ValueRange args) {
		Value iv = args[0];
		Value coord =
		genIndexLoad(builder, loc, coordinatesBuffers[tid][lvl], iv);
		// If coord < sliceHi
		Value cont = CMPI(ult, coord, sliceHi);

		TypeRange types = args.drop_front(2).getTypes();
		auto ifOp = builder.create<scf::IfOp>(loc, types, cont, true);
		{
		// 2 reduction variable maintained by us.
		SmallVector<Value> ifRet = args.drop_front(2);
		assert(ifRet.size() == args.size() - 2);

		OpBuilder::InsertionGuard guard(builder);
		// If not in slice.
		// Break the while loop (by setting continue to false)
		builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
		builder.create<scf::YieldOp>(loc, ifRet);

		aartbikUnsubmitted Done Reply Inline Actions here and a few other place, no period at end, please make once last pass over all new comments here aartbik: here and a few other place, no period at end, please make once last pass over all new comments…
		// If this is a legit coordinates in slice
		builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
		bodyBuilder(builder, loc, iv, ifRet);
		if (genYield) {
		builder.setInsertionPointToEnd(&ifOp.getThenRegion().front());
		builder.create<scf::YieldOp>(loc, ifRet);
		}
		}
		// Marks this speical ifOp to avoid sparisification finalizing it.
		ifOp->setAttr(getLoopEmitterLoopAttrName(),
		StringAttr::get(builder.getContext(), "slice"));
		// Insertion point restored to after ifOp.
		SmallVector<Value> yields;
		// Increase induction variable.
		yields.push_back(builder.create<arith::AddIOp>(loc, iv, c1));
		yields.push_back(cont);
		yields.append(ifOp.getResults().begin(), ifOp.getResults().end());
		builder.create<scf::YieldOp>(loc, yields);
		});

		builder.setInsertionPointAfter(whileOp);
		return std::make_pair(whileOp, whileOp.getResults().drop_front(2));
		}

		ValueRange LoopEmitter::genUnResolvedSliceTreeTraverse(
		OpBuilder &builder, Location loc, Value offset, TensorId tid, Level lvl,
		size_t depth, ValueRange userReduc,
		llvm::function_ref<void(OpBuilder &, Location, Value,
		MutableArrayRef<Value>)>
		bodyBuilder) {

		Value c0 = C_IDX(0), c1 = C_IDX(1), c2 = C_IDX(2);

		// TODO: it only works on all compressed tensor.
		Value sPtrBuf = slicePosBuffer[tid][lvl][depth];
		Value pSt = c2; // pointer starting index
		Value mSz = genIndexLoad(builder, loc, sPtrBuf, c0); // memSize

		auto forOp =
		scf::buildLoopNest(
		builder, loc, pSt, mSz, c2, userReduc,
		[this, c1, depth, tid, lvl, offset, sPtrBuf,
		bodyBuilder](OpBuilder &builder, Location loc, ValueRange ivs,
		ValueRange iterArgs) -> scf::ValueVector {
		// generate traversal for each level.
		Value loopLo = genIndexLoad(builder, loc, sPtrBuf, ivs.front());
		Value loopHi = genIndexLoad(
		builder, loc, sPtrBuf,
		builder.create<arith::AddIOp>(loc, ivs.front(), c1));
		return genSliceLvlTraverseLoop(builder, loc, loopLo, loopHi, offset,
		tid, lvl, depth, iterArgs, true,
		bodyBuilder)
		.second;
		})
		.loops.front();

		// Insert after current while operation.
		builder.setInsertionPointAfter(forOp);
		return forOp.getResults();
		}

		void LoopEmitter::genResolvedSliceBegin(OpBuilder &builder, Location loc,
		TensorId tid, Level lvl) {
		assert(lvl == 0 && "TODO: handle non-first level");
		Value c0 = C_IDX(0), c1 = C_IDX(1), c2 = C_IDX(2), c3 = C_IDX(3),
		c4 = C_IDX(4);
		Value size = sliceSizes[tid][0][0];
		Value sPtrBuf = slicePosBuffer[tid][0][0];
		Value pHi = genIndexLoad(builder, loc, positionsBuffers[tid][0], c1);
		// Fills out pIdxBuffer[tid][lvl][0] with [/memSize =/4, 0, 0, pHi]
		builder.create<memref::StoreOp>(loc, c4, sPtrBuf, c0); // memSize = 4
		builder.create<memref::StoreOp>(loc, c0, sPtrBuf, c1); // index = 0
		builder.create<memref::StoreOp>(loc, c0, sPtrBuf, c2); // pLo = 0;
		builder.create<memref::StoreOp>(loc, pHi, sPtrBuf, c3); // loaded pHi.

		// This is an non empty tensor if 0 < pHi.
		Value isNonEmpty = CMPI(ult, c0, pHi);
		// The minimal coord must be at the first on ordered level.
		// FIXME: Technically we should load the coord only when the slice is
		// nonempty. though we assume that even on empty sparse tensors, a non-empty
		// ptr/idx buffer is allocated for each level so it would not cause OOB to
		// avoid generating a ifOp here.
		Value minCrd = genIndexLoad(builder, loc, coordinatesBuffers[tid][0], c0);

		// FIXME: We need the relative offset related to the base slice.
		Value absOffset = offsetFromMinCoord(builder, loc, minCrd, size, isNonEmpty);
		sliceStack[tid].emplace_back(minCrd, absOffset, isNonEmpty, lvl, /depth=/1);
		}

		void LoopEmitter::genUnResolvedSliceBegin(OpBuilder &builder, Location loc,
		TensorId tid, Level lvl) {
		assert(isCompressedDLT(lvlTypes[tid][lvl]));
		Value c0 = C_IDX(0), c1 = C_IDX(1), c2 = C_IDX(2);
		const SliceInfo &sliceInfo = sliceStack[tid].back();
		unsigned prevLvl = *sliceInfo.slicedOnLvl;
		assert(lvl >= prevLvl);
		// Either lvl = prevSlicedLvl, i.e., t[d0 + d1 + d2,...] (more than one
		// variable need to be reduced on the same level).
		// Or lvl > prevSliceLvl + 1, i.e., t[..., d2, d3 + d4] (having a
		// simple dim expression in between).
		assert(lvl == prevLvl + 1 && "TODO: not yet implemented");
		// Check slice stack integrity.
		assert(slicePosBuffer[tid][prevLvl].size() == sliceInfo.depth);
		Value sPtrBuf = slicePosBuffer[tid][lvl].back();
		SmallVector<Value, 3> reduc = {
		constantI1(builder, loc, false), // isNonEmpty
		lvlSizes[tid][lvl], // minCoord
		c2, // memSize
		};

		ValueRange result = genUnResolvedSliceTreeTraverse(
		builder, loc, sliceInfo.offset, tid, prevLvl, sliceInfo.depth - 1, reduc,
		[this, c1, c2, tid, lvl, sPtrBuf](OpBuilder &builder, Location loc,
		Value iv,
		MutableArrayRef<Value> reduc) {
		Value &nonEmpty = reduc[0];
		Value &minCrd = reduc[1];
		Value &curMemSz = reduc[2];

		Value pHi = builder.create<arith::AddIOp>(loc, iv, c1);
		Value sPLo = genIndexLoad(builder, loc, positionsBuffers[tid][lvl], iv);
		Value sPHi =
		genIndexLoad(builder, loc, positionsBuffers[tid][lvl], pHi);

		// isNonEmpty = isNonEmpty \|\| lvlNonEmpty
		Value lvlNonEmpty = CMPI(ult, sPLo, sPHi);
		nonEmpty = builder.create<arith::OrIOp>(loc, lvlNonEmpty, nonEmpty);

		// Update minimal coordinate.
		auto ifNonEmpty = builder.create<scf::IfOp>(loc, builder.getIndexType(),
		lvlNonEmpty, true);
		{
		OpBuilder::InsertionGuard guard(builder);
		builder.setInsertionPointToStart(ifNonEmpty.thenBlock());
		Value curC =
		genIndexLoad(builder, loc, coordinatesBuffers[tid][lvl], sPLo);
		Value isSmaller = CMPI(ult, curC, minCrd);
		Value newMin =
		builder.create<arith::SelectOp>(loc, isSmaller, curC, minCrd);
		builder.create<scf::YieldOp>(loc, newMin);
		builder.setInsertionPointToStart(ifNonEmpty.elseBlock());
		builder.create<scf::YieldOp>(loc, minCrd);
		}
		minCrd = ifNonEmpty.getResult(0);
		// filles in
		builder.create<memref::StoreOp>(loc, sPLo, sPtrBuf, curMemSz);
		Value nxtMemSize = builder.create<arith::AddIOp>(loc, curMemSz, c1);
		builder.create<memref::StoreOp>(loc, sPHi, sPtrBuf, nxtMemSize);
		// curMemSize += 2
		curMemSz = builder.create<arith::AddIOp>(loc, curMemSz, c2);
		});

		unsigned depth = levelReducedDep[tid][lvl];
		Value size = sliceSizes[tid][lvl][depth];
		Value isNonEmpty = result[0];
		Value minCrd = result[1];
		// Two metadata [memSize, idx].
		// TODO: Can use an SSA value for these two metadata
		builder.create<memref::StoreOp>(loc, result[2], sPtrBuf, c0);
		builder.create<memref::StoreOp>(loc, c0, sPtrBuf, c1);
		// FIXME: we need the relative offset related to the base slice.
		Value absOffset = offsetFromMinCoord(builder, loc, minCrd, size, isNonEmpty);
		sliceStack[tid].emplace_back(minCrd, absOffset, isNonEmpty, lvl, depth + 1);
		}

		bool LoopEmitter::genSliceBegin(OpBuilder &builder, Location loc, TensorId tid,
		Level lvl) {
		Value c1 = C_IDX(1), c2 = C_IDX(2);

		if (depFullyReduced(tid, lvl)) {
		// If constraints on the tensor is fully resolved. We do not need to
		// generates slice begin any more, instead we fall back to TACO-based
		// algorithm to (co)iterates over the slice.
		Value pLoPtr =
		genIndexLoad(builder, loc, slicePosBuffer[tid][lvl].back(), c1);
		pLoPtr = builder.create<arith::AddIOp>(loc, pLoPtr, c2);
		Value pHiPtr = builder.create<arith::AddIOp>(loc, pLoPtr, c1);
		posits[tid][lvl] =
		genIndexLoad(builder, loc, slicePosBuffer[tid][lvl].back(), pLoPtr);
		highs[tid][lvl] =
		genIndexLoad(builder, loc, slicePosBuffer[tid][lvl].back(), pHiPtr);
		return true;
		}
		aartbikUnsubmitted Not Done Reply Inline Actions I first though this was commented out code ;-) So make it Generate: code aartbik: I first though this was commented out code ;-) So make it Generate: code

		// Only when the level is sorted, the next-non-empty slice can be computed
		// efficiently.
		const DimLevelType lvlType = lvlTypes[tid][lvl];
		assert(isOrderedDLT(lvlType));
		if (isSingletonDLT(lvlType)) {
		llvm_unreachable("TODO: dense level should be easy to support, while "
		"singleton level requres more efforts");
		}

		assert(!dependentLvlMap[tid][lvl].empty());
		assert(!sliceStack[tid].empty());

		const SliceInfo &sliceInfo = sliceStack[tid].back();
		auto baseEnc = getSparseTensorEncoding(tensors[tid].getType());
		if (baseEnc.isSlice())
		llvm_unreachable("TODO: not yet implemented");
		// Generate caches required to fast compute next-non-empty slices with
		// increasing offset for slice-base loop.
		// We do not need cache for dense levels.
		if (slicePosBuffer[tid][lvl][0] == nullptr && !isDenseDLT(lvlType)) {
		OpBuilder::InsertionGuard guard(builder);
		// The buffer can be reused, and the size is loop invariant: it only depends
		// on the iteration graph's toposort.
		builder.setInsertionPointAfter(localInsertPos);
		Value bufSize = C_IDX(1);
		Value c2 = C_IDX(2);
		// Accumlates the size required to cache the pLo for the slice.
		// E.g., if we want to cache the pIdx for slice<d0xd1xf64> on the second
		// level. We at most need to a memref<d0xindex>.
		// NOTE: this is apperantly an over-approximation when the previous
		// level is compressed, and we can compute a precise memory size
		// inside the loops. But that would also requires us to allocate/free
		// memorys in loops.
		// TODO: Maybe using allocaScopOp inside the loop to resolve the issue?
		for (Level curLevel = lvl;
		curLevel >= 1 && !lvlFullyResolved(tid, curLevel - 1); curLevel--) {
		auto depth = remDepOnLevel(tid, curLevel - 1);
		assert(sliceSizes[tid][lvl].size() >= depth);
		Value sz = *(sliceSizes[tid][lvl].rbegin() + depth - 1);
		bufSize = builder.create<arith::MulIOp>(loc, bufSize, sz);
		}
		// For a pair of [pLo, pHi]. Note that we can not compress pHi because slice
		// creates segments in the index buffer so that the pHi for the current
		// level is no longer the pLo for the next level.
		bufSize = builder.create<arith::MulIOp>(loc, bufSize, c2);
		// Additional two metadata {memSize, idx} at head.
		bufSize = builder.create<arith::AddIOp>(loc, bufSize, c2);
		llvm::for_each(
		slicePosBuffer[tid][lvl], [bufSize, loc, &builder](Value &cache) {
		cache = genAlloca(builder, loc, bufSize, builder.getIndexType());
		});
		}

		if (sliceInfo.isInitialTensor() \|\|
		(lvl >= 1 && lvlFullyResolved(tid, lvl - 1))) {
		// First level or previous level has been full resolved.
		genResolvedSliceBegin(builder, loc, tid, lvl);
		} else {
		// The previous level has not been full resolved.
		genUnResolvedSliceBegin(builder, loc, tid, lvl);
		}
		return false;
		}

		void LoopEmitter::genSliceNextInduction(OpBuilder &builder, Location loc,
		const Operation *op, TensorId tid,
		Level lvl,
		SmallVectorImpl<Value> &operands,
		unsigned &retIdx) {
		if (!isCompressedDLT(lvlTypes[tid][lvl]))
		llvm_unreachable("TODO");

		// else generate code to compute next non empty slice.
		Value c0 = C_IDX(0);
		Value c1 = C_IDX(1);
		Value c2 = C_IDX(2);

		auto whileOp = llvm::cast<scf::WhileOp>(op);
		SliceInfo &info = sliceStack[tid].back();
		assert(info.slicedOnLvl == lvl);

		//
		// We forward to the next non empty slice by
		// if (minCrd > offset) {
		// offset += 1
		// } else {
		// minCrd = nextMinInSlice();
		// offset = minCrd - size + 1;
		// }
		//
		// if (offset + size > parents.size)
		// isNonEmpty = false;
		//
		Value absOffset = info.offset;
		// Resets slices pointers as the resolved slices are invalidated after we
		// moves forward to the next slice.
		for (unsigned i = 0; i <= lvl; i++)
		builder.create<memref::StoreOp>(loc, c0, slicePosBuffer[tid][i].back(), c1);

		SmallVector<Value, 3> reduc = {info.minCrd, info.isNonEmpty, absOffset};
		Value sPtrBuf = slicePosBuffer[tid][lvl][info.depth - 1];
		Value fastPathP = CMPI(ugt, info.minCrd, absOffset);
		auto ifOp = builder.create<scf::IfOp>(loc, ValueRange(reduc).getTypes(),
		fastPathP, true);
		{
		OpBuilder::InsertionGuard guard(builder);
		// Take the fast path if minCrd > offset
		builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
		reduc[2] = builder.create<arith::AddIOp>(loc, absOffset, c1);
		// Yield offset + 1.
		builder.create<scf::YieldOp>(loc, reduc);

		// Else, take the slow path.
		builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
		reduc[2] = absOffset; // restore value.
		Value pSt = c2; // pointer starting index
		Value mSz = genIndexLoad(builder, loc, sPtrBuf, c0); // memSize
		reduc[0] = lvlSizes[tid][lvl]; // next min coord
		reduc[1] = constantI1(builder, loc, false); // isNonEmpty
		auto loopArgs = static_cast<ValueRange>(reduc).drop_back();
		auto forOp = scf::buildLoopNest(
		builder, loc, pSt, mSz, c2, loopArgs,
		[this, tid, lvl, c1, sPtrBuf,
		&info](OpBuilder &builder, Location loc, ValueRange ivs,
		ValueRange iterArgs) -> scf::ValueVector {
		Value curMinCrd = iterArgs[0];
		Value isNonEmpty = iterArgs[1];

		Type idxTp = builder.getIndexType();
		Value pLo = genIndexLoad(builder, loc, sPtrBuf, ivs.front());
		Value pHi =
		genIndexLoad(builder, loc, sPtrBuf,
		builder.create<arith::AddIOp>(loc, ivs.front(), c1));
		//
		// if pLo < pHi
		// coord = load[pLo]
		// if coord == minCrd
		// pLo += 1
		//
		// if pLo < pHi
		// curMinCrd = min(curMinCrd, load[pLo])
		//
		Value pred = CMPI(ult, pLo, pHi);
		auto advPLo = builder.create<scf::IfOp>(loc, idxTp, pred, true);
		/* if pLo < pHi */ {
		builder.setInsertionPointToStart(&advPLo.getThenRegion().front());
		// coord = load[pLo]
		Value coord =
		genIndexLoad(builder, loc, coordinatesBuffers[tid][lvl], pLo);
		Value pred = CMPI(eq, coord, info.minCrd);
		auto ifEqual = builder.create<scf::IfOp>(loc, idxTp, pred, true);
		/* if coord == minCrd */ {
		builder.setInsertionPointToStart(
		&ifEqual.getThenRegion().front());
		Value newPlo = builder.create<arith::AddIOp>(loc, pLo, c1);
		// Updates the cache.
		builder.create<memref::StoreOp>(loc, newPlo, sPtrBuf,
		ivs.front());
		builder.create<scf::YieldOp>(loc, newPlo);
		}
		/* else coord != minCrd */ {
		builder.setInsertionPointToStart(
		&ifEqual.getElseRegion().front());
		builder.create<scf::YieldOp>(loc, pLo);
		}
		builder.setInsertionPointAfter(ifEqual);
		builder.create<scf::YieldOp>(loc, ifEqual.getResults());
		}
		/* else pLo >= pHi */ {
		builder.setInsertionPointToStart(&advPLo.getElseRegion().front());
		builder.create<scf::YieldOp>(loc, pLo);
		}

		builder.setInsertionPointAfter(advPLo);
		pLo = advPLo.getResult(0);
		Value lvlNonEmpty = CMPI(ult, pLo, pHi);
		// Update minCrds
		auto newMin =
		builder.create<scf::IfOp>(loc, idxTp, lvlNonEmpty, true);
		builder.setInsertionPointToStart(&newMin.getThenRegion().front());
		builder.create<scf::YieldOp>(
		loc,
		genIndexLoad(builder, loc, coordinatesBuffers[tid][lvl], pLo));

		builder.setInsertionPointToStart(&newMin.getElseRegion().front());
		builder.create<scf::YieldOp>(loc, curMinCrd);
		builder.setInsertionPointAfter(newMin);

		// isNonEmpty = isNonEmpty \|\| lvlNonEmpty
		isNonEmpty =
		builder.create<arith::OrIOp>(loc, lvlNonEmpty, isNonEmpty);
		curMinCrd = builder.create<arith::SelectOp>(
		loc, CMPI(ult, newMin.getResult(0), curMinCrd),
		newMin.getResult(0), curMinCrd);
		return {curMinCrd, isNonEmpty};
		});

		builder.setInsertionPointAfter(forOp.loops.front());
		// minOffset = minCrd + 1 >= size ? minCrd + 1 - size : c0
		Value tmp = builder.create<arith::AddIOp>(loc, forOp.results.front(), c1);
		Value minOffset = builder.create<arith::SubIOp>(
		loc, tmp, sliceSizes[tid][lvl][info.depth - 1]);
		Value p = CMPI(uge, tmp, sliceSizes[tid][lvl][info.depth - 1]);
		minOffset = builder.create<arith::SelectOp>(loc, p, minOffset, c0);
		SmallVector<Value, 3> yields;
		yields.assign(forOp.results.begin(), forOp.results.end());
		yields.push_back(minOffset);
		builder.create<scf::YieldOp>(loc, yields);
		}

		Value nextMinCrd = ifOp.getResults()[0];
		Value nextNonEmpty = ifOp.getResults()[1];

		// the next offset should at least be offset + 1;
		aartbikUnsubmitted Done Reply Inline Actions The next aartbik: The next
		Value minOffset = ifOp.getResults()[2];
		Value nxOffset = builder.create<arith::AddIOp>(loc, info.offset, c1);
		Value maxPred = CMPI(ugt, minOffset, nxOffset);
		Value nextAbsOffset =
		builder.create<arith::SelectOp>(loc, maxPred, minOffset, nxOffset);

		Value sliceUB = builder.create<arith::AddIOp>(
		loc, nextAbsOffset, sliceSizes[tid][lvl][info.depth - 1]);

		// FIXME: this only works if the parsent is the tensor, we should use the
		// parents slice size + parent offset.
		assert(info.depth - 1 == 0);
		// nextNonEmpty = nextNonEmpty && slice upper bound <= parent upperbound.
		nextNonEmpty = builder.create<arith::AndIOp>(
		loc, nextNonEmpty, CMPI(ule, sliceUB, lvlSizes[tid][lvl]));

		// FIXME: compute relative offset.
		assert(info.depth - 1 == 0);
		Value nextRelOffset = nextAbsOffset;
		nextRelOffset =
		builder.create<arith::SelectOp>(loc, nextNonEmpty, nextRelOffset, c0);

		operands.push_back(nextNonEmpty);
		operands.push_back(nextMinCrd);
		operands.push_back(nextAbsOffset); // we push the absolute offset.

		// Update the slice stack.
		info.isNonEmpty = whileOp.getResult(retIdx++);
		info.minCrd = whileOp.getResult(retIdx++);
		info.offset = whileOp.getResult(retIdx++);
		}

		Operation *LoopEmitter::emitSliceDrivenLoopOverTensorAtLvl(
		OpBuilder &builder, Location loc, TensorId tid, Level lvl,
		MutableArrayRef<Value> reduc) {
		assert(!depFullyReduced(tid, lvl));
		SliceInfo &sliceInfo = sliceStack[tid].back();
		assert(sliceInfo.slicedOnLvl == lvl);

		// NOTE: The order matters!
		SmallVector<Value, 3> operands{sliceInfo.isNonEmpty, sliceInfo.minCrd,
		sliceInfo.offset};
		// number of reduction maintained by us.
		size_t numMetaReduc = operands.size();

		// Append user-required reduction values.
		operands.append(reduc.begin(), reduc.end());
		assert(operands.size() == numMetaReduc + reduc.size());

		auto whileOp = builder.create<scf::WhileOp>(
		loc, ValueRange(operands).getTypes(), operands,
		/beforeBuilder=/
		[](OpBuilder &builder, Location loc, ValueRange args) {
		builder.create<scf::ConditionOp>(loc, /isNonEmpty/ args[0], args);
		},
		/afterBuilder=/
		[this, tid, lvl, reduc, numMetaReduc,
		&sliceInfo](OpBuilder &builder, Location loc, ValueRange args) {
		assert(args.size() == reduc.size() + numMetaReduc);
		sliceInfo.isNonEmpty = args[0];
		sliceInfo.minCrd = args[1];
		sliceInfo.offset = args[2];
		// The slice offset is the coordinate.
		Value c = sliceInfo.offset;
		if (sliceInfo.depth > 1) {
		// Coord is the relative offset related to its parents.
		// Update c = absOffset[lvl][depth] - absOffset[lvl][depth - 1]
		llvm_unreachable("TODO: not yet implement");
		}
		coords[tid][lvl] = c;

		for (unsigned i = 0, e = reduc.size(); i < e; i++)
		reduc[i] = args[i + numMetaReduc];
		});

		// Increments the number of resolved constraints on tid.
		// levelReducedDep[tid][lvl]++;
		// Set the insertion point to while loop body.
		aartbikUnsubmitted Done Reply Inline Actions Sets (or Increment), but use one style aartbik: Sets (or Increment), but use one style
		builder.setInsertionPointToEnd(&whileOp.getAfter().front());
		return whileOp;
		}

		#undef CMPI
		#undef C_IDX

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorRewriting.cpp

Show First 20 Lines • Show All 1,012 Lines • ▼ Show 20 Lines	if (!reducValue.empty()) {
rewriter.inlineBlockBefore(srcBlock, &*rewriter.getInsertionPoint(),		rewriter.inlineBlockBefore(srcBlock, &*rewriter.getInsertionPoint(),
args);		args);
}		}

for (Dimension d = 0; d < dimRank; d++) {		for (Dimension d = 0; d < dimRank; d++) {
// Link the reduction chain. Note that loop emitter update the reducValue		// Link the reduction chain. Note that loop emitter update the reducValue
// in place.		// in place.
loopEmitter.exitCurrentLoop(rewriter, loc, reducValue);		loopEmitter.exitCurrentLoop(rewriter, loc, reducValue);
loopEmitter.exitCurrentLoopSeq();		loopEmitter.exitCurrentLoopSeq(rewriter, loc);
}		}

// Replace the foreach operator with the value returned by the outtermost		// Replace the foreach operator with the value returned by the outtermost
// for loop.		// for loop.
rewriter.replaceOp(op, reducValue);		rewriter.replaceOp(op, reducValue);
return success();		return success();
}		}
};		};
▲ Show 20 Lines • Show All 124 Lines • Show Last 20 Lines

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
}		}

/// A helper class that visits an affine expression and tries to find an		/// A helper class that visits an affine expression and tries to find an
/// AffineDimExpr to which the corresponding iterator from a GenericOp matches		/// AffineDimExpr to which the corresponding iterator from a GenericOp matches
/// the desired iterator type.		/// the desired iterator type.
class AffineDimFinder : public AffineExprVisitor<AffineDimFinder> {		class AffineDimFinder : public AffineExprVisitor<AffineDimFinder> {
public:		public:
explicit AffineDimFinder(linalg::GenericOp op)		explicit AffineDimFinder(linalg::GenericOp op)
: iterTypes(op.getIteratorTypesArray()) {}		: iterTypes(op.getIteratorTypes()) {}
void visitDimExpr(AffineDimExpr expr) {		void visitDimExpr(AffineDimExpr expr) {
		aartbikUnsubmitted Done Reply Inline Actions I know this was already there, but can we use override here to make it more clear that we implementing the base visitor class? Or at the very least group all overrides into a // Visitor method overrides. ... section? aartbik: I know this was already there, but can we use override here to make it more clear that we…
		PeimingAuthorUnsubmitted Done Reply Inline Actions I added a comment, this is non-vritual function, so I did not use override here. Peiming: I added a comment, this is non-vritual function, so I did not use override here.
if (pickedDim == nullptr \|\| pickIterType == iterTypes[expr.getPosition()]) {		if (pickedDim == nullptr \|\|
		pickIterType == iterTypes[expr.getPosition()]
		.cast<linalg::IteratorTypeAttr>()
		.getValue()) {
pickedDim = expr;		pickedDim = expr;
}		}
}		}

/// Set the desired iterator type that we want to pick.		/// Set the desired iterator type that we want to pick.
void setPickedIterType(utils::IteratorType iterType) {		void setPickedIterType(utils::IteratorType iterType) {
pickIterType = iterType;		pickIterType = iterType;
}		}

/// Get the desired AffineDimExpr.		/// Get the desired AffineDimExpr.
AffineDimExpr getDimExpr() const { return pickedDim.cast<AffineDimExpr>(); }		AffineDimExpr getDimExpr() const { return pickedDim.cast<AffineDimExpr>(); }

private:		private:
/// The picked AffineDimExpr after visit. This must be stored as		/// The picked AffineDimExpr after visit. This must be stored as
/// `AffineExpr` rather than `AffineDimExpr`, because the latter		/// `AffineExpr` rather than `AffineDimExpr`, because the latter
/// doesn't have a default ctor.		/// doesn't have a default ctor.
AffineExpr pickedDim;		AffineExpr pickedDim;
/// The iterator type that we want.		/// The iterator type that we want.
utils::IteratorType pickIterType;		utils::IteratorType pickIterType;
/// The mapping between dim=>iterator type.		/// The mapping between dim=>iterator type.
SmallVector<utils::IteratorType> iterTypes;		ArrayAttr iterTypes;
};		};

// Flattens an affine expression into a list of AffineDimExprs.		// Flattens an affine expression into a list of AffineDimExprs.
struct AffineDimCollector : public AffineExprVisitor<AffineDimCollector> {		struct AffineDimCollector : public AffineExprVisitor<AffineDimCollector> {
void visitDimExpr(AffineDimExpr expr) { dims.push_back(expr); }		void visitDimExpr(AffineDimExpr expr) { dims.push_back(expr); }
SmallVector<AffineDimExpr> dims;		SmallVector<AffineDimExpr> dims;
};		};

▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	if (isSubExp) {
// E.g.,		// E.g.,
// `d0 + d1` for indexing t0[lvl0] and `d0 + d2` for indexing t1[lvl0]		// `d0 + d1` for indexing t0[lvl0] and `d0 + d2` for indexing t1[lvl0]
// d0_1 = getNextSliceOffset t0 along lvl0		// d0_1 = getNextSliceOffset t0 along lvl0
// d0_2 = getNextSliceOffset t1 along lvl0		// d0_2 = getNextSliceOffset t1 along lvl0
// if d0_1 == d0_2 then d0 = d0_1 = d0_1		// if d0_1 == d0_2 then d0 = d0_1 = d0_1
// else increase min(d0_1, d0_2).		// else increase min(d0_1, d0_2).
return false;		return false;
}		}
merger.setLoopDependentTensorLevel(ldx, tensor, lvl);		merger.setLoopDependentTensorLevel(ldx, tensor, lvl, dlt);
}		}
return true;		return true;
}		}
case AffineExprKind::Constant:		case AffineExprKind::Constant:
case AffineExprKind::Mul:		case AffineExprKind::Mul:
// TODO: Support Mul and Constant AffineExp for slice-based codegen		// TODO: Support Mul and Constant AffineExp for slice-based codegen
return false;		return false;
case AffineExprKind::Add: {		case AffineExprKind::Add: {
▲ Show 20 Lines • Show All 451 Lines • ▼ Show 20 Lines	static bool computeIterationGraph(CodegenEnv &env, SortMask mask,
std::vector<std::vector<bool>> adjM(numLoops,		std::vector<std::vector<bool>> adjM(numLoops,
std::vector<bool>(numLoops, false));		std::vector<bool>(numLoops, false));
std::vector<unsigned> inDegree(numLoops, 0); // in-degree of each node.		std::vector<unsigned> inDegree(numLoops, 0); // in-degree of each node.
const auto iteratorTypes = env.op().getIteratorTypesArray();		const auto iteratorTypes = env.op().getIteratorTypesArray();
// Iterate over the indexing maps of every tensor in the tensor expression.		// Iterate over the indexing maps of every tensor in the tensor expression.
for (OpOperand &t : env.op()->getOpOperands()) {		for (OpOperand &t : env.op()->getOpOperands()) {
// Get map and encoding.		// Get map and encoding.
const auto enc = getSparseTensorEncoding(t.get().getType());		const auto enc = getSparseTensorEncoding(t.get().getType());
assert(env.op().getMatchingIndexingMap(&t).getNumDims() +
getNumNonTrivialIdxExpOnSparseLvls(env.op()) ==
numLoops);

// Skips dense inputs/outputs when not requested.		// Skips dense inputs/outputs when not requested.
const bool isDenseInput = !enc && env.op().isDpsInput(&t);		const bool isDenseInput = !enc && env.op().isDpsInput(&t);
const bool isDenseOutput = !enc && !isDenseInput;		const bool isDenseOutput = !enc && !isDenseInput;
if ((isDenseInput && !includesDenseInput(mask)) \|\|		if ((isDenseInput && !includesDenseInput(mask)) \|\|
(isDenseOutput && !includesDenseOutput(mask)))		(isDenseOutput && !includesDenseOutput(mask)))
continue;		continue;

// Push unrelated loops into sparse iteration space, so these		// Push unrelated loops into sparse iteration space, so these
▲ Show 20 Lines • Show All 704 Lines • ▼ Show 20 Lines	static bool startLoopSeq(CodegenEnv &env, OpBuilder &builder, ExprId exp,
});		});

env.emitter().enterNewLoopSeq(builder, env.op().getLoc(), tids, lvls);		env.emitter().enterNewLoopSeq(builder, env.op().getLoc(), tids, lvls);

// Maintain the universal index only if it is actually		// Maintain the universal index only if it is actually
// consumed by a subsequent lattice point.		// consumed by a subsequent lattice point.
if (needsUniv) {		if (needsUniv) {
for (const LatPointId li : env.set(lts).drop_front())		for (const LatPointId li : env.set(lts).drop_front())
if (!env.merger().hasAnySparse(env.lat(li).simple) &&		if (!env.merger().hasAnySparse(env.lat(li).simple))
!env.merger().hasSparseIdxReduction(env.lat(li).simple))
return true;		return true;
}		}
return false;		return false;
}		}

static void genConstantDenseAddressFromLevel(CodegenEnv &env,		static void genConstantDenseAddressFromLevel(CodegenEnv &env,
OpBuilder &builder, TensorId tid,		OpBuilder &builder, TensorId tid,
Level startLvl) {		Level startLvl) {
▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	if (tid != env.merger().getOutTensorID())
genConstantDenseAddressFromLevel(env, builder, tid, lvl + 1);		genConstantDenseAddressFromLevel(env, builder, tid, lvl + 1);
}		}

return std::make_pair(loop, isSingleCond);		return std::make_pair(loop, isSingleCond);
}		}

/// Ends a single loop in current sequence. Returns new values for needsUniv.		/// Ends a single loop in current sequence. Returns new values for needsUniv.
static bool endLoop(CodegenEnv &env, RewriterBase &rewriter, Operation *loop,		static bool endLoop(CodegenEnv &env, RewriterBase &rewriter, Operation *loop,
LoopId idx, LatPointId li, bool needsUniv) {		LoopId idx, LatPointId li, bool needsUniv,
// End a while-loop.		bool isSingleCond) {
if (auto whileOp = dyn_cast<scf::WhileOp>(loop)) {
finalizeWhileOp(env, rewriter, idx, needsUniv, whileOp);		if (isSingleCond) {
} else if (auto forOp = dyn_cast<scf::ForOp>(loop)) {		// Could be a for-loop or a while-loop for iterating over slice.
		aartbikUnsubmitted Done Reply Inline Actions I would start with the same comment as in the else, and state it in the affirmative rather than the speculative // End either a for-loop or a while-loop that iterates over a slice. aartbik: I would start with the same comment as in the else, and state it in the affirmative rather than…
// Any iteration of a reduction for-loop creates a valid lex insert.		// Any iteration creates a valid lex insert.
if (env.isReduc() && env.getValidLexInsert())		if (env.isReduc() && env.getValidLexInsert())
env.setValidLexInsert(constantI1(rewriter, env.op().getLoc(), true));		env.setValidLexInsert(constantI1(rewriter, env.op().getLoc(), true));
		} else if (auto whileOp = dyn_cast<scf::WhileOp>(loop)) {
		// End a while-loop.
		finalizeWhileOp(env, rewriter, idx, needsUniv, whileOp);
} else {		} else {
needsUniv = false;		needsUniv = false;
}		}

env.genLoopBoundary([&](MutableArrayRef<Value> reduc) {		env.genLoopBoundary([&](MutableArrayRef<Value> reduc) {
env.emitter().exitCurrentLoop(rewriter, env.op().getLoc(), reduc);		env.emitter().exitCurrentLoop(rewriter, env.op().getLoc(), reduc);
return std::nullopt;		return std::nullopt;
});		});

return needsUniv;		return needsUniv;
}		}

/// Ends a loop sequence at given level.		/// Ends a loop sequence at given level.
static void endLoopSeq(CodegenEnv &env, OpBuilder &builder, ExprId exp,		static void endLoopSeq(CodegenEnv &env, OpBuilder &builder, unsigned exp,
LoopOrd at, LoopId idx, LoopId ldx) {		unsigned at, unsigned idx, unsigned ldx) {
assert(!env.getLoopVar(idx));		assert(!env.getLoopVar(idx));
env.emitter().exitCurrentLoopSeq();		env.emitter().exitCurrentLoopSeq(builder, env.op().getLoc());
// Unmark bookkeeping of invariants and loop index.		// Unmark bookkeeping of invariants and loop index.
genInvariants(env, builder, exp, ldx, /atStart=/false);		genInvariants(env, builder, exp, ldx, /atStart=/false);
// Finalize access pattern expansion for sparse tensor output.		// Finalize access pattern expansion for sparse tensor output.
genExpand(env, builder, at, /atStart=/false);		genExpand(env, builder, at, /atStart=/false);
}		}

/// Recursively generates code while computing iteration lattices in order		/// Recursively generates code while computing iteration lattices in order
/// to manage the complexity of implementing co-iteration over unions		/// to manage the complexity of implementing co-iteration over unions
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	for (unsigned j = 0; j < lsize; j++) {
endIf(env, rewriter, ifOp, loop, redInput, cntInput, insInput);		endIf(env, rewriter, ifOp, loop, redInput, cntInput, insInput);
} else {		} else {
genStmt(env, rewriter, ej, at + 1);		genStmt(env, rewriter, ej, at + 1);
}		}
}		}
}		}

// End a loop.		// End a loop.
needsUniv = endLoop(env, rewriter, loop, idx, li, needsUniv);		needsUniv = endLoop(env, rewriter, loop, idx, li, needsUniv, isSingleCond);
}		}

// End a loop sequence.		// End a loop sequence.
endLoopSeq(env, rewriter, exp, at, idx, ldx);		endLoopSeq(env, rewriter, exp, at, idx, ldx);
}		}

/// Converts the result computed by the sparse kernel into the required form.		/// Converts the result computed by the sparse kernel into the required form.
static void genResult(CodegenEnv &env, RewriterBase &rewriter) {		static void genResult(CodegenEnv &env, RewriterBase &rewriter) {
▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp

Show First 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	: outTensor(numInputOutputTensors - 1),
numTensors(numInputOutputTensors + 1), numNativeLoops(numNativeLoops),		numTensors(numInputOutputTensors + 1), numNativeLoops(numNativeLoops),
numLoops(numNativeLoops + numFilterLoops), hasSparseOut(false),		numLoops(numNativeLoops + numFilterLoops), hasSparseOut(false),
lvlTypes(numTensors,		lvlTypes(numTensors,
std::vector<DimLevelType>(numLoops, DimLevelType::Undef)),		std::vector<DimLevelType>(numLoops, DimLevelType::Undef)),
loopToLvl(numTensors,		loopToLvl(numTensors,
std::vector<std::optional<Level>>(numLoops, std::nullopt)),		std::vector<std::optional<Level>>(numLoops, std::nullopt)),
lvlToLoop(numTensors,		lvlToLoop(numTensors,
std::vector<std::optional<LoopId>>(maxLvlRank, std::nullopt)),		std::vector<std::optional<LoopId>>(maxLvlRank, std::nullopt)),
loopToDependencies(numLoops, std::vector<std::optional<Level>>(		loopToDependencies(
		numLoops, std::vector<std::optional<std::pair<Level, DimLevelType>>>(
numTensors, std::nullopt)),		numTensors, std::nullopt)),
levelToDependentIdx(numTensors, std::vector<std::vector<LoopId>>(		levelToDependentLoop(numTensors, std::vector<std::vector<LoopId>>(
maxLvlRank, std::vector<LoopId>())),		maxLvlRank, std::vector<LoopId>())),
loopBounds(numLoops, std::make_pair(numTensors, numLoops)) {}		loopBounds(numLoops, std::make_pair(numTensors, numLoops)) {}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Lattice methods.		// Lattice methods.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

ExprId Merger::addTensorExp(TensorId t) {		ExprId Merger::addTensorExp(TensorId t) {
assert(isValidTensorId(t));		assert(isValidTensorId(t));
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	BitVector Merger::simplifyCond(LatSetId s0, LatPointId p0) {
bool isSingleton = true;		bool isSingleton = true;
for (const LatPointId p1 : set(s0)) {		for (const LatPointId p1 : set(s0)) {
if (p0 != p1 && latGT(p0, p1)) {		if (p0 != p1 && latGT(p0, p1)) {
isSingleton = false;		isSingleton = false;
break;		break;
}		}
}		}

BitVector simple(lat(p0).bits);		BitVector simple(latPoints[p0].bits);
bool reset =		bool reset = isSingleton && hasAnySparse(simple);
isSingleton && (hasAnySparse(simple) \|\| hasSparseIdxReduction(simple));		const TensorLoopId be = simple.size();
// `be`, `b`, and `offset` are `TensorLoopId` in spirit; but we avoid		TensorLoopId offset = 0; // relative to the end
// using that class in this function because we need to do a bunch of
// arithmetic on them, so using the newtype would introduce too much
// boilerplate.
const unsigned be = simple.size();
unsigned offset = 0; // relative to the end
if (!reset)		if (!reset)
// Starts resetting from a dense level, so that the first bit (if kept)		// Starts resetting from a dense level, so that the first bit (if kept)
// is not undefined level-type.		// is not undefined level-type.
for (unsigned b = 0; b < be; b++) {		for (unsigned b = 0; b < be; b++) {
if (simple[b] && isDenseDLT(getDimLevelType(TensorLoopId{b}))) {		if (simple[b] && isDenseDLT(getDimLevelType(TensorLoopId{b}))) {
offset = be - b - 1; // relative to the end		offset = be - b - 1; // relative to the end
break;		break;
}		}
}		}

// Now apply the two basic rules. We also iterate the bits reversely to always		// Now apply the two basic rules. We also iterate the bits reversely to always
// keep the rightmost bit (which could possibly be a synthetic tensor).		// keep the rightmost bit (which could possibly be a synthetic tensor).
for (unsigned b = be - 1 - offset, i = 0; i < be;		for (unsigned b = be - 1 - offset, i = 0; i < be;
b = b == 0 ? be - 1 : b - 1, i++) {		b = b == 0 ? be - 1 : b - 1, i++) {
// FIXME: better name? also slice on dense level has locate property as		// Slice on dense level has locate property as well, and can be optimized.
// well. Handle it correctly!		if (simple[b] && !isSparseLvlWithNonTrivialIdxExp(b)) {
if (simple[b] && !isLvlWithNonTrivialIdxExp(TensorLoopId{b})) {		const auto dlt = getDimLevelType(b);
		aartbikUnsubmitted Done Reply Inline Actions has the `locate` property as well aartbik: has the `locate` property as well
const auto dlt = getDimLevelType(TensorLoopId{b});
if (!isCompressedDLT(dlt) && !isSingletonDLT(dlt)) {		if (!isCompressedDLT(dlt) && !isSingletonDLT(dlt)) {
if (reset)		if (reset)
simple.reset(b);		simple.reset(b);
reset = true;		reset = true;
}		}
}		}
}		}
return simple;		return simple;
}		}

bool Merger::latGT(LatPointId i, LatPointId j) const {		bool Merger::latGT(LatPointId i, LatPointId j) const {
const BitVector &bitsi = lat(i).bits;		const BitVector &bitsi = lat(i).bits;
const BitVector &bitsj = lat(j).bits;		const BitVector &bitsj = lat(j).bits;
assert(bitsi.size() == bitsj.size());		assert(bitsi.size() == bitsj.size());
if (bitsi.count() > bitsj.count()) {		if (bitsi.count() > bitsj.count()) {
for (TensorLoopId b = 0, be = bitsj.size(); b < be; b++)		for (TensorLoopId b = 0, be = bitsj.size(); b < be; b++)
if (bitsj[b] && !bitsi[b])		if (bitsj[b] && !bitsi[b])
return false;		return false;
return true;		return true;
}		}
return false;		return false;
}		}

bool Merger::onlyDenseDiff(LatPointId i, LatPointId j) const {		bool Merger::onlyDenseDiff(LatPointId i, LatPointId j) const {
BitVector tmp(lat(j).bits);		BitVector tmp(latPoints[j].bits);
tmp ^= lat(i).bits;		tmp ^= latPoints[i].bits;
return !hasAnySparse(tmp) && !hasSparseIdxReduction(tmp);		return !hasAnySparse(tmp);
}		}

bool Merger::expContainsTensor(ExprId e, TensorId t) const {		bool Merger::expContainsTensor(ExprId e, TensorId t) const {
const auto &expr = exp(e);		const auto &expr = exp(e);
if (expr.kind == TensorExp::Kind::kTensor)		if (expr.kind == TensorExp::Kind::kTensor)
return expr.tensor == t;		return expr.tensor == t;

switch (getExpArity(expr.kind)) {		switch (getExpArity(expr.kind)) {
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	bool Merger::isSingleCondition(TensorId t, ExprId e) const {
case TensorExp::Kind::kBinary:		case TensorExp::Kind::kBinary:
case TensorExp::Kind::kReduce:		case TensorExp::Kind::kReduce:
return false;		return false;
}		}
llvm_unreachable("unexpected kind");		llvm_unreachable("unexpected kind");
}		}

bool Merger::hasAnySparse(const BitVector &bits) const {		bool Merger::hasAnySparse(const BitVector &bits) const {
for (TensorLoopId b = 0, be = bits.size(); b < be; b++)		for (TensorLoopId b : bits.set_bits()) {
if (bits[b]) {
const auto dlt = getDimLevelType(b);		const auto dlt = getDimLevelType(b);
if (isCompressedDLT(dlt) \|\| isSingletonDLT(dlt))		if (isCompressedDLT(dlt) \|\| isSingletonDLT(dlt))
return true;		return true;
}		}
return false;		return hasSparseIdxReduction(bits);
}		}

bool Merger::hasSparseIdxReduction(const BitVector &bits) const {		bool Merger::hasSparseIdxReduction(const BitVector &bits) const {
// TODO: return false on dense levels.		for (TensorLoopId b : bits.set_bits())
for (unsigned b = 0, be = bits.size(); b < be; b++)		if (isSparseLvlWithNonTrivialIdxExp(b))
if (bits[b] && isLvlWithNonTrivialIdxExp(b))
return true;		return true;
return false;		return false;
}		}

#ifndef NDEBUG		#ifndef NDEBUG

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Print methods (for debugging).		// Print methods (for debugging).
▲ Show 20 Lines • Show All 875 Lines • Show Last 20 Lines

mlir/test/Dialect/SparseTensor/sparse_conv_2d_slice_based.mlir

This file was added.

				// RUN: mlir-opt %s --sparsification="enable-index-reduction=true" --cse \| FileCheck %s

				#map = affine_map<(d0, d1, d2, d3) -> (d0 + d2, d1 + d3)>
				#map1 = affine_map<(d0, d1, d2, d3) -> (d2, d3)>
				#map2 = affine_map<(d0, d1, d2, d3) -> (d0, d1)>

				#DCSR = #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>

				// CHECK-LABEL: func.func @conv2d_all_sparse_CSR(
				// CHECK-SAME: %[[VAL_0:.]]: tensor<8x8xi32, #{{.}}>,
				// CHECK-SAME: %[[VAL_1:.]]: tensor<3x3xi32>) -> tensor<6x6xi32, #{{.}}> {
				// CHECK-DAG: %[[VAL_2:.*]] = arith.constant 8 : index
				// CHECK-DAG: %[[VAL_3:.*]] = arith.constant 3 : index
				// CHECK-DAG: %[[VAL_4:.*]] = arith.constant 0 : index
				// CHECK-DAG: %[[VAL_5:.*]] = arith.constant 1 : index
				// CHECK-DAG: %[[VAL_6:.*]] = arith.constant 2 : index
				// CHECK-DAG: %[[VAL_7:.*]] = arith.constant 4 : index
				// CHECK-DAG: %[[VAL_8:.*]] = arith.constant 0 : i32
				// CHECK-DAG: %[[VAL_9:.*]] = arith.constant true
				// CHECK-DAG: %[[VAL_10:.*]] = arith.constant false
				// CHECK-DAG: %[[VAL_11:.]] = bufferization.alloc_tensor() : tensor<6x6xi32, #{{.}}>
				// CHECK-DAG: %[[VAL_12:.]] = sparse_tensor.positions %[[VAL_0]] {level = 0 : index} : tensor<8x8xi32, #{{.}}> to memref<?xindex>
				// CHECK-DAG: %[[VAL_13:.]] = sparse_tensor.coordinates %[[VAL_0]] {level = 0 : index} : tensor<8x8xi32, #{{.}}> to memref<?xindex>
				// CHECK-DAG: %[[VAL_14:.]] = sparse_tensor.positions %[[VAL_0]] {level = 1 : index} : tensor<8x8xi32, #{{.}}> to memref<?xindex>
				// CHECK-DAG: %[[VAL_15:.]] = sparse_tensor.coordinates %[[VAL_0]] {level = 1 : index} : tensor<8x8xi32, #{{.}}> to memref<?xindex>
				// CHECK-DAG: %[[VAL_16:.]] = sparse_tensor.values %[[VAL_0]] : tensor<8x8xi32, #{{.}}> to memref<?xi32>
				// CHECK-DAG: %[[VAL_17:.*]] = bufferization.to_memref %[[VAL_1]] : memref<3x3xi32>
				// CHECK-DAG: %[[VAL_18:.*]] = memref.alloca(%[[VAL_2]]) : memref<?xindex>
				// CHECK-DAG: %[[VAL_19:.*]] = memref.alloca(%[[VAL_7]]) : memref<?xindex>
				// CHECK: %[[VAL_20:.*]] = memref.load %[[VAL_12]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_7]], %[[VAL_19]]{{\[}}%[[VAL_4]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_4]], %[[VAL_19]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_4]], %[[VAL_19]]{{\[}}%[[VAL_6]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_20]], %[[VAL_19]]{{\[}}%[[VAL_3]]] : memref<?xindex>
				// CHECK: %[[VAL_21:.*]] = arith.cmpi ugt, %[[VAL_20]], %[[VAL_4]] : index
				// CHECK: %[[VAL_22:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_4]]] : memref<?xindex>
				// CHECK: %[[VAL_23:.*]] = arith.cmpi uge, %[[VAL_22]], %[[VAL_3]] : index
				// CHECK: %[[VAL_24:.*]] = arith.andi %[[VAL_21]], %[[VAL_23]] : i1
				// CHECK: %[[VAL_25:.*]] = arith.addi %[[VAL_22]], %[[VAL_5]] : index
				// CHECK: %[[VAL_26:.*]] = arith.subi %[[VAL_25]], %[[VAL_3]] : index
				// CHECK: %[[VAL_27:.*]] = arith.select %[[VAL_24]], %[[VAL_26]], %[[VAL_4]] : index
				// CHECK: %[[VAL_28:.]]:4 = scf.while (%[[VAL_29:.]] = %[[VAL_21]], %[[VAL_30:.]] = %[[VAL_22]], %[[VAL_31:.]] = %[[VAL_27]], %[[VAL_32:.]] = %[[VAL_11]]) : (i1, index, index, tensor<6x6xi32, #{{.}}>) -> (i1, index, index, tensor<6x6xi32, #{{.*}}>) {
				// CHECK: scf.condition(%[[VAL_29]]) %[[VAL_29]], %[[VAL_30]], %[[VAL_31]], %[[VAL_32]] : i1, index, index, tensor<6x6xi32, #{{.*}}>
				// CHECK: } do {
				// CHECK: ^bb0(%[[VAL_33:.]]: i1, %[[VAL_34:.]]: index, %[[VAL_35:.]]: index, %[[VAL_36:.]]: tensor<6x6xi32, #{{.*}}>):
				// CHECK: %[[VAL_37:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_4]]] : memref<?xindex>
				// CHECK: %[[VAL_38:.]]:3 = scf.for %[[VAL_39:.]] = %[[VAL_6]] to %[[VAL_37]] step %[[VAL_6]] iter_args(%[[VAL_40:.]] = %[[VAL_10]], %[[VAL_41:.]] = %[[VAL_2]], %[[VAL_42:.*]] = %[[VAL_6]]) -> (i1, index, index) {
				// CHECK: %[[VAL_43:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_39]]] : memref<?xindex>
				// CHECK: %[[VAL_44:.*]] = arith.addi %[[VAL_39]], %[[VAL_5]] : index
				// CHECK: %[[VAL_45:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_44]]] : memref<?xindex>
				// CHECK: %[[VAL_46:.*]] = arith.addi %[[VAL_35]], %[[VAL_3]] : index
				// CHECK: %[[VAL_47:.]]:5 = scf.while (%[[VAL_48:.]] = %[[VAL_43]], %[[VAL_49:.]] = %[[VAL_9]], %[[VAL_50:.]] = %[[VAL_40]], %[[VAL_51:.]] = %[[VAL_41]], %[[VAL_52:.]] = %[[VAL_42]]) : (index, i1, i1, index, index) -> (index, i1, i1, index, index) {
				// CHECK: %[[VAL_53:.*]] = arith.cmpi ult, %[[VAL_48]], %[[VAL_45]] : index
				// CHECK: %[[VAL_54:.*]] = arith.andi %[[VAL_49]], %[[VAL_53]] : i1
				// CHECK: scf.condition(%[[VAL_54]]) %[[VAL_48]], %[[VAL_49]], %[[VAL_50]], %[[VAL_51]], %[[VAL_52]] : index, i1, i1, index, index
				// CHECK: } do {
				// CHECK: ^bb0(%[[VAL_55:.]]: index, %[[VAL_56:.]]: i1, %[[VAL_57:.]]: i1, %[[VAL_58:.]]: index, %[[VAL_59:.*]]: index):
				// CHECK: %[[VAL_60:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_55]]] : memref<?xindex>
				// CHECK: %[[VAL_61:.*]] = arith.cmpi ult, %[[VAL_60]], %[[VAL_46]] : index
				// CHECK: %[[VAL_62:.*]]:3 = scf.if %[[VAL_61]] -> (i1, index, index) {
				// CHECK: %[[VAL_63:.*]] = arith.addi %[[VAL_55]], %[[VAL_5]] : index
				// CHECK: %[[VAL_64:.*]] = memref.load %[[VAL_14]]{{\[}}%[[VAL_55]]] : memref<?xindex>
				// CHECK: %[[VAL_65:.*]] = memref.load %[[VAL_14]]{{\[}}%[[VAL_63]]] : memref<?xindex>
				// CHECK: %[[VAL_66:.*]] = arith.cmpi ult, %[[VAL_64]], %[[VAL_65]] : index
				// CHECK: %[[VAL_67:.*]] = arith.ori %[[VAL_66]], %[[VAL_57]] : i1
				// CHECK: %[[VAL_68:.*]] = scf.if %[[VAL_66]] -> (index) {
				// CHECK: %[[VAL_69:.*]] = memref.load %[[VAL_15]]{{\[}}%[[VAL_64]]] : memref<?xindex>
				// CHECK: %[[VAL_70:.*]] = arith.cmpi ult, %[[VAL_69]], %[[VAL_58]] : index
				// CHECK: %[[VAL_71:.*]] = arith.select %[[VAL_70]], %[[VAL_69]], %[[VAL_58]] : index
				// CHECK: scf.yield %[[VAL_71]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_58]] : index
				// CHECK: }
				// CHECK: memref.store %[[VAL_64]], %[[VAL_18]]{{\[}}%[[VAL_59]]] : memref<?xindex>
				// CHECK: %[[VAL_72:.*]] = arith.addi %[[VAL_59]], %[[VAL_5]] : index
				// CHECK: memref.store %[[VAL_65]], %[[VAL_18]]{{\[}}%[[VAL_72]]] : memref<?xindex>
				// CHECK: %[[VAL_73:.*]] = arith.addi %[[VAL_59]], %[[VAL_6]] : index
				// CHECK: scf.yield %[[VAL_67]], %[[VAL_74:.*]], %[[VAL_73]] : i1, index, index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_57]], %[[VAL_58]], %[[VAL_59]] : i1, index, index
				// CHECK: } {"Emitted from" = "slice"}
				// CHECK: %[[VAL_75:.*]] = arith.addi %[[VAL_55]], %[[VAL_5]] : index
				// CHECK: scf.yield %[[VAL_75]], %[[VAL_61]], %[[VAL_76:.*]]#0, %[[VAL_76]]#1, %[[VAL_76]]#2 : index, i1, i1, index, index
				// CHECK: }
				// CHECK: scf.yield %[[VAL_77:.*]]#2, %[[VAL_77]]#3, %[[VAL_77]]#4 : i1, index, index
				// CHECK: }
				// CHECK: memref.store %[[VAL_78:.*]]#2, %[[VAL_18]]{{\[}}%[[VAL_4]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_4]], %[[VAL_18]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: %[[VAL_79:.*]] = arith.cmpi uge, %[[VAL_78]]#1, %[[VAL_3]] : index
				// CHECK: %[[VAL_80:.*]] = arith.andi %[[VAL_78]]#0, %[[VAL_79]] : i1
				// CHECK: %[[VAL_81:.*]] = arith.addi %[[VAL_78]]#1, %[[VAL_5]] : index
				// CHECK: %[[VAL_82:.*]] = arith.subi %[[VAL_81]], %[[VAL_3]] : index
				// CHECK: %[[VAL_83:.*]] = arith.select %[[VAL_80]], %[[VAL_82]], %[[VAL_4]] : index
				// CHECK: %[[VAL_84:.]]:4 = scf.while (%[[VAL_85:.]] = %[[VAL_78]]#0, %[[VAL_86:.]] = %[[VAL_78]]#1, %[[VAL_87:.]] = %[[VAL_83]], %[[VAL_88:.]] = %[[VAL_36]]) : (i1, index, index, tensor<6x6xi32, #{{.}}>) -> (i1, index, index, tensor<6x6xi32, #{{.*}}>) {
				// CHECK: scf.condition(%[[VAL_85]]) %[[VAL_85]], %[[VAL_86]], %[[VAL_87]], %[[VAL_88]] : i1, index, index, tensor<6x6xi32, #{{.*}}>
				// CHECK: } do {
				// CHECK: ^bb0(%[[VAL_89:.]]: i1, %[[VAL_90:.]]: index, %[[VAL_91:.]]: index, %[[VAL_92:.]]: tensor<6x6xi32, #{{.*}}>):
				// CHECK: %[[VAL_93:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: %[[VAL_94:.*]] = arith.addi %[[VAL_93]], %[[VAL_6]] : index
				// CHECK: %[[VAL_95:.*]] = arith.addi %[[VAL_94]], %[[VAL_5]] : index
				// CHECK: %[[VAL_96:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_94]]] : memref<?xindex>
				// CHECK: %[[VAL_97:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_95]]] : memref<?xindex>
				// CHECK: %[[VAL_98:.*]] = arith.addi %[[VAL_35]], %[[VAL_3]] : index
				// CHECK: %[[VAL_99:.]]:5 = scf.while (%[[VAL_100:.]] = %[[VAL_96]], %[[VAL_101:.]] = %[[VAL_9]], %[[VAL_102:.]] = %[[VAL_8]], %[[VAL_103:.]] = %[[VAL_10]], %[[VAL_104:.]] = %[[VAL_92]]) : (index, i1, i32, i1, tensor<6x6xi32, #{{.}}>) -> (index, i1, i32, i1, tensor<6x6xi32, #{{.}}>) {
				// CHECK: %[[VAL_105:.*]] = arith.cmpi ult, %[[VAL_100]], %[[VAL_97]] : index
				// CHECK: %[[VAL_106:.*]] = arith.andi %[[VAL_101]], %[[VAL_105]] : i1
				// CHECK: scf.condition(%[[VAL_106]]) %[[VAL_100]], %[[VAL_101]], %[[VAL_102]], %[[VAL_103]], %[[VAL_104]] : index, i1, i32, i1, tensor<6x6xi32, #{{.*}}>
				// CHECK: } do {
				// CHECK: ^bb0(%[[VAL_107:.]]: index, %[[VAL_108:.]]: i1, %[[VAL_109:.]]: i32, %[[VAL_110:.]]: i1, %[[VAL_111:.]]: tensor<6x6xi32, #{{.}}>):
				// CHECK: %[[VAL_112:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_107]]] : memref<?xindex>
				// CHECK: %[[VAL_113:.*]] = arith.cmpi ult, %[[VAL_112]], %[[VAL_98]] : index
				// CHECK: %[[VAL_114:.]]:3 = scf.if %[[VAL_113]] -> (i32, i1, tensor<6x6xi32, #{{.}}>) {
				// CHECK: %[[VAL_115:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_107]]] : memref<?xindex>
				// CHECK: %[[VAL_116:.*]] = arith.subi %[[VAL_115]], %[[VAL_35]] : index
				// CHECK: %[[VAL_117:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: %[[VAL_118:.*]] = arith.addi %[[VAL_117]], %[[VAL_6]] : index
				// CHECK: %[[VAL_119:.*]] = arith.addi %[[VAL_118]], %[[VAL_5]] : index
				// CHECK: %[[VAL_120:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_118]]] : memref<?xindex>
				// CHECK: %[[VAL_121:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_119]]] : memref<?xindex>
				// CHECK: %[[VAL_122:.*]] = arith.addi %[[VAL_91]], %[[VAL_3]] : index
				// CHECK: %[[VAL_123:.]]:5 = scf.while (%[[VAL_124:.]] = %[[VAL_120]], %[[VAL_125:.]] = %[[VAL_9]], %[[VAL_126:.]] = %[[VAL_109]], %[[VAL_127:.]] = %[[VAL_110]], %[[VAL_128:.]] = %[[VAL_111]]) : (index, i1, i32, i1, tensor<6x6xi32, #{{.}}>) -> (index, i1, i32, i1, tensor<6x6xi32, #{{.}}>) {
				// CHECK: %[[VAL_129:.*]] = arith.cmpi ult, %[[VAL_124]], %[[VAL_121]] : index
				// CHECK: %[[VAL_130:.*]] = arith.andi %[[VAL_125]], %[[VAL_129]] : i1
				// CHECK: scf.condition(%[[VAL_130]]) %[[VAL_124]], %[[VAL_125]], %[[VAL_126]], %[[VAL_127]], %[[VAL_128]] : index, i1, i32, i1, tensor<6x6xi32, #{{.*}}>
				// CHECK: } do {
				// CHECK: ^bb0(%[[VAL_131:.]]: index, %[[VAL_132:.]]: i1, %[[VAL_133:.]]: i32, %[[VAL_134:.]]: i1, %[[VAL_135:.]]: tensor<6x6xi32, #{{.}}>):
				// CHECK: %[[VAL_136:.*]] = memref.load %[[VAL_15]]{{\[}}%[[VAL_131]]] : memref<?xindex>
				// CHECK: %[[VAL_137:.*]] = arith.cmpi ult, %[[VAL_136]], %[[VAL_122]] : index
				// CHECK: %[[VAL_138:.]]:3 = scf.if %[[VAL_137]] -> (i32, i1, tensor<6x6xi32, #{{.}}>) {
				// CHECK: %[[VAL_139:.*]] = memref.load %[[VAL_15]]{{\[}}%[[VAL_131]]] : memref<?xindex>
				// CHECK: %[[VAL_140:.*]] = arith.subi %[[VAL_139]], %[[VAL_91]] : index
				// CHECK: %[[VAL_141:.*]] = memref.load %[[VAL_16]]{{\[}}%[[VAL_131]]] : memref<?xi32>
				// CHECK: %[[VAL_142:.*]] = memref.load %[[VAL_17]]{{\[}}%[[VAL_116]], %[[VAL_140]]] : memref<3x3xi32>
				// CHECK: %[[VAL_143:.*]] = arith.muli %[[VAL_141]], %[[VAL_142]] : i32
				// CHECK: %[[VAL_144:.*]] = arith.addi %[[VAL_133]], %[[VAL_143]] : i32
				// CHECK: scf.yield %[[VAL_144]], %[[VAL_9]], %[[VAL_135]] : i32, i1, tensor<6x6xi32, #{{.*}}>
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_133]], %[[VAL_134]], %[[VAL_135]] : i32, i1, tensor<6x6xi32, #{{.*}}>
				// CHECK: } {"Emitted from" = "slice"}
				// CHECK: %[[VAL_145:.*]] = arith.addi %[[VAL_131]], %[[VAL_5]] : index
				// CHECK: scf.yield %[[VAL_145]], %[[VAL_137]], %[[VAL_146:.]]#0, %[[VAL_146]]#1, %[[VAL_146]]#2 : index, i1, i32, i1, tensor<6x6xi32, #{{.}}>
				// CHECK: } attributes {"Emitted from" = "linalg.generic"}
				// CHECK: %[[VAL_147:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: %[[VAL_148:.*]] = arith.addi %[[VAL_147]], %[[VAL_6]] : index
				// CHECK: memref.store %[[VAL_148]], %[[VAL_18]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: scf.yield %[[VAL_149:.]]#2, %[[VAL_9]], %[[VAL_149]]#4 : i32, i1, tensor<6x6xi32, #{{.}}>
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_109]], %[[VAL_110]], %[[VAL_111]] : i32, i1, tensor<6x6xi32, #{{.*}}>
				// CHECK: } {"Emitted from" = "slice"}
				// CHECK: %[[VAL_150:.*]] = arith.addi %[[VAL_107]], %[[VAL_5]] : index
				// CHECK: scf.yield %[[VAL_150]], %[[VAL_113]], %[[VAL_151:.]]#0, %[[VAL_151]]#1, %[[VAL_151]]#2 : index, i1, i32, i1, tensor<6x6xi32, #{{.}}>
				// CHECK: } attributes {"Emitted from" = "linalg.generic"}
				// CHECK: %[[VAL_152:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: %[[VAL_153:.*]] = arith.addi %[[VAL_152]], %[[VAL_6]] : index
				// CHECK: memref.store %[[VAL_153]], %[[VAL_19]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: %[[VAL_154:.]] = scf.if %[[VAL_155:.]]#3 -> (tensor<6x6xi32, #{{.*}}>) {
				// CHECK: %[[VAL_156:.]] = sparse_tensor.insert %[[VAL_155]]#2 into %[[VAL_155]]#4{{\[}}%[[VAL_35]], %[[VAL_91]]] : tensor<6x6xi32, #{{.}}>
				// CHECK: scf.yield %[[VAL_156]] : tensor<6x6xi32, #{{.*}}>
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_157:.]]#4 : tensor<6x6xi32, #{{.}}>
				// CHECK: }
				// CHECK: memref.store %[[VAL_4]], %[[VAL_19]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_4]], %[[VAL_18]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: %[[VAL_158:.*]] = arith.cmpi ugt, %[[VAL_90]], %[[VAL_91]] : index
				// CHECK: %[[VAL_159:.*]]:3 = scf.if %[[VAL_158]] -> (index, i1, index) {
				// CHECK: %[[VAL_160:.*]] = arith.addi %[[VAL_91]], %[[VAL_5]] : index
				// CHECK: scf.yield %[[VAL_90]], %[[VAL_89]], %[[VAL_160]] : index, i1, index
				// CHECK: } else {
				// CHECK: %[[VAL_161:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_4]]] : memref<?xindex>
				// CHECK: %[[VAL_162:.]]:2 = scf.for %[[VAL_163:.]] = %[[VAL_6]] to %[[VAL_161]] step %[[VAL_6]] iter_args(%[[VAL_164:.]] = %[[VAL_2]], %[[VAL_165:.]] = %[[VAL_10]]) -> (index, i1) {
				// CHECK: %[[VAL_166:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_163]]] : memref<?xindex>
				// CHECK: %[[VAL_167:.*]] = arith.addi %[[VAL_163]], %[[VAL_5]] : index
				// CHECK: %[[VAL_168:.*]] = memref.load %[[VAL_18]]{{\[}}%[[VAL_167]]] : memref<?xindex>
				// CHECK: %[[VAL_169:.*]] = arith.cmpi ult, %[[VAL_166]], %[[VAL_168]] : index
				// CHECK: %[[VAL_170:.*]] = scf.if %[[VAL_169]] -> (index) {
				// CHECK: %[[VAL_171:.*]] = memref.load %[[VAL_15]]{{\[}}%[[VAL_166]]] : memref<?xindex>
				// CHECK: %[[VAL_172:.*]] = arith.cmpi eq, %[[VAL_171]], %[[VAL_90]] : index
				// CHECK: %[[VAL_173:.*]] = scf.if %[[VAL_172]] -> (index) {
				// CHECK: %[[VAL_174:.*]] = arith.addi %[[VAL_166]], %[[VAL_5]] : index
				// CHECK: memref.store %[[VAL_174]], %[[VAL_18]]{{\[}}%[[VAL_163]]] : memref<?xindex>
				// CHECK: scf.yield %[[VAL_174]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_166]] : index
				// CHECK: }
				// CHECK: scf.yield %[[VAL_175:.*]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_166]] : index
				// CHECK: }
				// CHECK: %[[VAL_176:.]] = arith.cmpi ult, %[[VAL_177:.]], %[[VAL_168]] : index
				// CHECK: %[[VAL_178:.*]] = scf.if %[[VAL_176]] -> (index) {
				// CHECK: %[[VAL_179:.*]] = memref.load %[[VAL_15]]{{\[}}%[[VAL_177]]] : memref<?xindex>
				// CHECK: scf.yield %[[VAL_179]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_164]] : index
				// CHECK: }
				// CHECK: %[[VAL_180:.*]] = arith.ori %[[VAL_176]], %[[VAL_165]] : i1
				// CHECK: %[[VAL_181:.]] = arith.cmpi ult, %[[VAL_182:.]], %[[VAL_164]] : index
				// CHECK: %[[VAL_183:.*]] = arith.select %[[VAL_181]], %[[VAL_182]], %[[VAL_164]] : index
				// CHECK: scf.yield %[[VAL_183]], %[[VAL_180]] : index, i1
				// CHECK: }
				// CHECK: %[[VAL_184:.]] = arith.addi %[[VAL_185:.]]#0, %[[VAL_5]] : index
				// CHECK: %[[VAL_186:.*]] = arith.subi %[[VAL_184]], %[[VAL_3]] : index
				// CHECK: %[[VAL_187:.*]] = arith.cmpi uge, %[[VAL_184]], %[[VAL_3]] : index
				// CHECK: %[[VAL_188:.*]] = arith.select %[[VAL_187]], %[[VAL_186]], %[[VAL_4]] : index
				// CHECK: scf.yield %[[VAL_185]]#0, %[[VAL_185]]#1, %[[VAL_188]] : index, i1, index
				// CHECK: }
				// CHECK: %[[VAL_189:.*]] = arith.addi %[[VAL_91]], %[[VAL_5]] : index
				// CHECK: %[[VAL_190:.]] = arith.cmpi ugt, %[[VAL_191:.]]#2, %[[VAL_189]] : index
				// CHECK: %[[VAL_192:.*]] = arith.select %[[VAL_190]], %[[VAL_191]]#2, %[[VAL_189]] : index
				// CHECK: %[[VAL_193:.*]] = arith.addi %[[VAL_192]], %[[VAL_3]] : index
				// CHECK: %[[VAL_194:.*]] = arith.cmpi ule, %[[VAL_193]], %[[VAL_2]] : index
				// CHECK: %[[VAL_195:.*]] = arith.andi %[[VAL_191]]#1, %[[VAL_194]] : i1
				// CHECK: scf.yield %[[VAL_195]], %[[VAL_191]]#0, %[[VAL_192]], %[[VAL_196:.]] : i1, index, index, tensor<6x6xi32, #{{.}}>
				// CHECK: } attributes {"Emitted from" = "linalg.generic"}
				// CHECK: memref.store %[[VAL_4]], %[[VAL_19]]{{\[}}%[[VAL_5]]] : memref<?xindex>
				// CHECK: %[[VAL_197:.*]] = arith.cmpi ugt, %[[VAL_34]], %[[VAL_35]] : index
				// CHECK: %[[VAL_198:.*]]:3 = scf.if %[[VAL_197]] -> (index, i1, index) {
				// CHECK: %[[VAL_199:.*]] = arith.addi %[[VAL_35]], %[[VAL_5]] : index
				// CHECK: scf.yield %[[VAL_34]], %[[VAL_33]], %[[VAL_199]] : index, i1, index
				// CHECK: } else {
				// CHECK: %[[VAL_200:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_4]]] : memref<?xindex>
				// CHECK: %[[VAL_201:.]]:2 = scf.for %[[VAL_202:.]] = %[[VAL_6]] to %[[VAL_200]] step %[[VAL_6]] iter_args(%[[VAL_203:.]] = %[[VAL_2]], %[[VAL_204:.]] = %[[VAL_10]]) -> (index, i1) {
				// CHECK: %[[VAL_205:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_202]]] : memref<?xindex>
				// CHECK: %[[VAL_206:.*]] = arith.addi %[[VAL_202]], %[[VAL_5]] : index
				// CHECK: %[[VAL_207:.*]] = memref.load %[[VAL_19]]{{\[}}%[[VAL_206]]] : memref<?xindex>
				// CHECK: %[[VAL_208:.*]] = arith.cmpi ult, %[[VAL_205]], %[[VAL_207]] : index
				// CHECK: %[[VAL_209:.*]] = scf.if %[[VAL_208]] -> (index) {
				// CHECK: %[[VAL_210:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_205]]] : memref<?xindex>
				// CHECK: %[[VAL_211:.*]] = arith.cmpi eq, %[[VAL_210]], %[[VAL_34]] : index
				// CHECK: %[[VAL_212:.*]] = scf.if %[[VAL_211]] -> (index) {
				// CHECK: %[[VAL_213:.*]] = arith.addi %[[VAL_205]], %[[VAL_5]] : index
				// CHECK: memref.store %[[VAL_213]], %[[VAL_19]]{{\[}}%[[VAL_202]]] : memref<?xindex>
				// CHECK: scf.yield %[[VAL_213]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_205]] : index
				// CHECK: }
				// CHECK: scf.yield %[[VAL_214:.*]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_205]] : index
				// CHECK: }
				// CHECK: %[[VAL_215:.]] = arith.cmpi ult, %[[VAL_216:.]], %[[VAL_207]] : index
				// CHECK: %[[VAL_217:.*]] = scf.if %[[VAL_215]] -> (index) {
				// CHECK: %[[VAL_218:.*]] = memref.load %[[VAL_13]]{{\[}}%[[VAL_216]]] : memref<?xindex>
				// CHECK: scf.yield %[[VAL_218]] : index
				// CHECK: } else {
				// CHECK: scf.yield %[[VAL_203]] : index
				// CHECK: }
				// CHECK: %[[VAL_219:.*]] = arith.ori %[[VAL_215]], %[[VAL_204]] : i1
				// CHECK: %[[VAL_220:.]] = arith.cmpi ult, %[[VAL_221:.]], %[[VAL_203]] : index
				// CHECK: %[[VAL_222:.*]] = arith.select %[[VAL_220]], %[[VAL_221]], %[[VAL_203]] : index
				// CHECK: scf.yield %[[VAL_222]], %[[VAL_219]] : index, i1
				// CHECK: }
				// CHECK: %[[VAL_223:.]] = arith.addi %[[VAL_224:.]]#0, %[[VAL_5]] : index
				// CHECK: %[[VAL_225:.*]] = arith.subi %[[VAL_223]], %[[VAL_3]] : index
				// CHECK: %[[VAL_226:.*]] = arith.cmpi uge, %[[VAL_223]], %[[VAL_3]] : index
				// CHECK: %[[VAL_227:.*]] = arith.select %[[VAL_226]], %[[VAL_225]], %[[VAL_4]] : index
				// CHECK: scf.yield %[[VAL_224]]#0, %[[VAL_224]]#1, %[[VAL_227]] : index, i1, index
				// CHECK: }
				// CHECK: %[[VAL_228:.*]] = arith.addi %[[VAL_35]], %[[VAL_5]] : index
				// CHECK: %[[VAL_229:.]] = arith.cmpi ugt, %[[VAL_230:.]]#2, %[[VAL_228]] : index
				// CHECK: %[[VAL_231:.*]] = arith.select %[[VAL_229]], %[[VAL_230]]#2, %[[VAL_228]] : index
				// CHECK: %[[VAL_232:.*]] = arith.addi %[[VAL_231]], %[[VAL_3]] : index
				// CHECK: %[[VAL_233:.*]] = arith.cmpi ule, %[[VAL_232]], %[[VAL_2]] : index
				// CHECK: %[[VAL_234:.*]] = arith.andi %[[VAL_230]]#1, %[[VAL_233]] : i1
				// CHECK: scf.yield %[[VAL_234]], %[[VAL_230]]#0, %[[VAL_231]], %[[VAL_235:.]]#3 : i1, index, index, tensor<6x6xi32, #{{.}}>
				// CHECK: } attributes {"Emitted from" = "linalg.generic"}
				// CHECK: %[[VAL_236:.]] = sparse_tensor.load %[[VAL_237:.]]#3 hasInserts : tensor<6x6xi32, #{{.*}}>
				// CHECK: return %[[VAL_236]] : tensor<6x6xi32, #{{.*}}>
				// CHECK: }
				func.func @conv2d_all_sparse_CSR(%arg0: tensor<8x8xi32, #DCSR>,
				%arg1: tensor<3x3xi32>) -> tensor<6x6xi32, #DCSR> {
				%0 = bufferization.alloc_tensor() : tensor<6x6xi32, #DCSR>
				%1 = linalg.generic {
				indexing_maps = [#map, #map1, #map2],
				iterator_types = ["parallel", "parallel", "reduction", "reduction"]}
				ins(%arg0, %arg1 : tensor<8x8xi32, #DCSR>, tensor<3x3xi32>)
				outs(%0 : tensor<6x6xi32, #DCSR>) {
				^bb0(%in: i32, %in_0: i32, %out: i32):
				%2 = arith.muli %in, %in_0 : i32
				%3 = arith.addi %out, %2 : i32
				linalg.yield %3 : i32
				} -> tensor<6x6xi32, #DCSR>
				return %1 : tensor<6x6xi32, #DCSR>
				}

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_2d_slice_based.mlir

This file was added.

				// DEFINE: %{option} = "enable-index-reduction=true enable-runtime-library=false"
				// DEFINE: %{command} = mlir-opt %s --sparse-compiler=%{option} \| \
				// DEFINE: mlir-cpu-runner \
				// DEFINE: -e entry -entry-point-result=void \
				// DEFINE: -shared-libs=%mlir_lib_dir/libmlir_c_runner_utils%shlibext \| \
				// DEFINE: FileCheck %s
				//
				// RUN: %{command}

				#map = affine_map<(d0, d1, d2, d3) -> (d0 + d2, d1 + d3)>
				#map1 = affine_map<(d0, d1, d2, d3) -> (d2, d3)>
				#map2 = affine_map<(d0, d1, d2, d3) -> (d0, d1)>

				#DCSR = #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ] }>

				module {
				func.func @conv2d_all_sparse_CSR(%arg0: tensor<8x8xi32, #DCSR>, %arg1: tensor<3x3xi32>) -> tensor<6x6xi32, #DCSR> {
				%0 = bufferization.alloc_tensor() : tensor<6x6xi32, #DCSR>
				%1 = linalg.generic {
				indexing_maps = [#map, #map1, #map2],
				iterator_types = ["parallel", "parallel", "reduction", "reduction"]}
				ins(%arg0, %arg1 : tensor<8x8xi32, #DCSR>, tensor<3x3xi32>)
				outs(%0 : tensor<6x6xi32, #DCSR>) {
				^bb0(%in: i32, %in_0: i32, %out: i32):
				%2 = arith.muli %in, %in_0 : i32
				%3 = arith.addi %out, %2 : i32
				linalg.yield %3 : i32
				} -> tensor<6x6xi32, #DCSR>
				return %1 : tensor<6x6xi32, #DCSR>
				}

				func.func @entry() {
				%c0 = arith.constant 0 : index
				%i0 = arith.constant 0 : i32

				// A typical edge detection filter.
				%filter = arith.constant dense<[
				[ 1, 0, -1 ],
				[ 0, 0, 0 ],
				[ -1, 0, 1 ]
				]> : tensor<3x3xi32>

				%input = arith.constant dense<[
				[ 1, 2, 3, 4, 0, 6, 7, 8 ],
				[ 2, 2, 4, 4, 0, 0, 6, 8 ],
				[ 2, 2, 4, 4, 0, 0, 6, 8 ],
				[ 2, 2, 3, 4, 0, 0, 7, 8 ],
				[ 1, 3, 3, 4, 0, 0, 6, 8 ],
				[ 3, 2, 3, 4, 0, 0, 7, 8 ],
				[ 1, 3, 3, 4, 3, 6, 6, 8 ],
				[ 1, 3, 3, 4, 3, 0, 7, 8 ]
				]> : tensor<8x8xi32>

				%sparse_filter_CSR = sparse_tensor.convert %filter
				: tensor<3x3xi32> to tensor<3x3xi32>

				%sparse_input_CSR = sparse_tensor.convert %input
				: tensor<8x8xi32> to tensor<8x8xi32, #DCSR>

				%3 = call @conv2d_all_sparse_CSR(%sparse_input_CSR, %sparse_filter_CSR)
				: (tensor<8x8xi32, #DCSR>,
				tensor<3x3xi32>) -> tensor<6x6xi32, #DCSR>

				%out = sparse_tensor.convert %3
				: tensor<6x6xi32, #DCSR> to tensor<6x6xi32>
				//
				// CHECK: ( ( 0, 0, -1, -6, -1, 6 ),
				// CHECK-SAME: ( -1, 0, 1, 0, 1, 0 ),
				// CHECK-SAME: ( 0, -1, 1, 0, 0, 0 ),
				// CHECK-SAME: ( -1, 0, 0, 0, 0, 0 ),
				// CHECK-SAME: ( 0, 0, 3, 6, -3, -6 ),
				// CHECK-SAME: ( 2, -1, 3, 0, -3, 0 ) )
				//
				%v2 = vector.transfer_read %out[%c0, %c0], %i0
				: tensor<6x6xi32>, vector<6x6xi32>
				vector.print %v2 : vector<6x6xi32>

				return
				}

				}

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_3d_slice_based.mlir

This file was added.

				// DEFINE: %{option} = "enable-index-reduction=true enable-runtime-library=false"
				// DEFINE: %{command} = mlir-opt %s --sparse-compiler=%{option} \| \
				// DEFINE: mlir-cpu-runner \
				// DEFINE: -e entry -entry-point-result=void \
				// DEFINE: -shared-libs=%mlir_lib_dir/libmlir_c_runner_utils%shlibext \| \
				// DEFINE: FileCheck %s
				//
				// RUN: %{command}

				#CCC = #sparse_tensor.encoding<{
				dimLevelType = [ "compressed", "compressed", "compressed" ]
				}>

				func.func @alloc_3d_filled_f32(%s1 : index, %s2 : index, %s3 : index, %f : f32) -> tensor<?x?x?xf32> {
				%buf = bufferization.alloc_tensor(%s1, %s2, %s3) : tensor<?x?x?xf32>
				%ret = linalg.fill ins(%f : f32) outs(%buf : tensor<?x?x?xf32>) -> tensor<?x?x?xf32>
				return %ret : tensor<?x?x?xf32>
				}

				func.func @conv_3d_CCC(%arg0: tensor<?x?x?xf32, #CCC>, %arg1: tensor<?x?x?xf32>) -> tensor<?x?x?xf32, #CCC> {
				%c6 = arith.constant 6 : index
				%s = bufferization.alloc_tensor(%c6, %c6, %c6) : tensor<?x?x?xf32, #CCC>
				%ret = linalg.conv_3d
				ins (%arg0, %arg1: tensor<?x?x?xf32, #CCC>, tensor<?x?x?xf32>)
				outs (%s: tensor<?x?x?xf32, #CCC>) -> tensor<?x?x?xf32, #CCC>
				return %ret : tensor<?x?x?xf32, #CCC>
				}

				func.func @entry() {
				%c0 = arith.constant 0 : index
				%c1 = arith.constant 1 : index
				%c3 = arith.constant 3 : index
				%c6 = arith.constant 6 : index
				%c8 = arith.constant 8 : index
				%f10 = arith.constant 10.00000e+00 : f32
				%val = arith.constant 2.00000e+00 : f32
				%zero = arith.constant 0.00000e+00 : f32

				%filter3D = call @alloc_3d_filled_f32(%c3, %c3, %c3, %val) : (index, index, index, f32) -> (tensor<?x?x?xf32>)
				%in3D_tmp = call @alloc_3d_filled_f32(%c8, %c8, %c8, %val) : (index, index, index, f32) -> (tensor<?x?x?xf32>)
				%in3D = tensor.insert %f10 into %in3D_tmp[%c0, %c3, %c0] : tensor<?x?x?xf32>
				%out3D = call @alloc_3d_filled_f32(%c6, %c6, %c6, %zero) : (index, index, index, f32) -> (tensor<?x?x?xf32>)

				%in3D_CCC = sparse_tensor.convert %in3D
				: tensor<?x?x?xf32> to tensor<?x?x?xf32, #CCC>
				%CCC_ret = call @conv_3d_CCC(%in3D_CCC, %filter3D) : (tensor<?x?x?xf32, #CCC>, tensor<?x?x?xf32>) -> (tensor<?x?x?xf32, #CCC>)
				// CHECK: ( ( ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 124, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 124, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 124, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ) ),
				// CHECK-SAME: ( ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ) ),
				// CHECK-SAME: ( ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ) ),
				// CHECK-SAME: ( ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ) ),
				// CHECK-SAME: ( ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ) ),
				// CHECK-SAME: ( ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ),
				// CHECK-SAME: ( 108, 108, 108, 108, 108, 108 ) ) )
				%1 = sparse_tensor.convert %CCC_ret
				: tensor<?x?x?xf32, #CCC> to tensor<?x?x?xf32>
				%v1 = vector.transfer_read %1[%c0, %c0, %c0], %zero
				: tensor<?x?x?xf32>, vector<6x6x6xf32>
				vector.print %v1 : vector<6x6x6xf32>

				// Free the resources
				bufferization.dealloc_tensor %in3D : tensor<?x?x?xf32>
				bufferization.dealloc_tensor %filter3D : tensor<?x?x?xf32>

				bufferization.dealloc_tensor %in3D_CCC : tensor<?x?x?xf32, #CCC>
				bufferization.dealloc_tensor %CCC_ret : tensor<?x?x?xf32, #CCC>

				return
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] extend loop emitter to emit slice driven loopsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 509708

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorRewriting.cpp

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp

mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp

mlir/test/Dialect/SparseTensor/sparse_conv_2d_slice_based.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_2d_slice_based.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_conv_3d_slice_based.mlir

[mlir][sparse] extend loop emitter to emit slice driven loops
ClosedPublic