Code for lowering sparse_tensor.unary and sparse_tensor.binary within a linalg.generic block.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
@aartbik Please take a look and let me know your thoughts on my general approach. I also indicated where I am stuck and could use some help, as I don't truly understand the lattice-set theory.
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp | ||
---|---|---|
546 | I need help here. I'm not sure what I need to include such that absentVal is added to the output for every missing value in the input. I assume this is something similar to kTensor or kInvariant, as doing v+1 inside linalg.generic will create a dense output. I need something similar to happen here. | |
608 | I'm stuck here. The kBinary below creates one kBinaryRegion and two kUnaryRegions. These are not called/handled for the vector tests. But for the matrix test, the kBinaryRegion is re-evaluated for some reason. The problem is that the origin sparse_tensor.binary operation has already been split apart. I essentially need a no-op here (i.e. it's already working fine. Don't re-evaluate). I tried calling takeConj again, but that somehow eliminates the disjoint pieces that were added in the kBinary section. | |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_matrix_ops.mlir | ||
107 | This test fails with an assertion error: | |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_vector_ops.mlir | ||
95 | This test passes. | |
167 | This test passes. | |
187 | This test fails with an assertion error: |
mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h | ||
---|---|---|
50 | At first reading, I expected the kUnaryRegion and kBinaryRegion, but was a bit surprised with the kUnary and kBinary (since that generalizes a lot of the ops already there). Can you briefly document what the intended differences are here? Scanning ahead, it seems like one is eventually broken up into several of the others, but I wonder if we can't just keep a single kUnaryRegion and kBinaryRegion, and let the op itself drive the codegen? | |
98 | style: period at end in comment Also, originally I avoided linking this TensorExpr back to IR of code in MLIR, since during lattice construction, one must be careful to maintain the right 1:1 correspondence between the lattice point and the IR. Obviously, we have little choice for the new binary/unary regions, but let's document that here carefully. | |
158 | ah, I see you solved the issue I alluded to above by passing this explicitly to the merge operations; let's brainstorm later if we can somehow do this using the child nodes only... | |
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp | ||
797–800 | This is very surprising? Is this right? | |
1223 | unnecessary format change? | |
1370 | unnecessary format change? | |
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp | ||
25 | this is not completely in style with the general setup: only assign the fields that are know to be set, assert on nullptr in all others | |
73 | I think kUnary/BinaryRegion should be new cases that set op, | |
530 | note that this could be a simple cast, not a dyn_cast, but more preferable, it looks like we could simply keep a single kUnaryRegion case, see below | |
546 | This is indeed a bit trickier. For the time being, to focus on the overall logic, let's simply put this case under the same as all the other zero preserving unary ops, and assert that absent() is not set. Then we will enhance the logic later. | |
608 | If you rewrite it as suggested below, I think the no-op comes natural. | |
625 | same, should be a cast here, but it looks like we could simply keep a single kBinaryRegion case by looking at the result of this cast (which then needs to be dynamic again ;-) and decide what to do. I think I prefer that a bit more than artificially introducing the kBinary/kUnary as "handled" cases. | |
632 | it feels this whole block of code, L612 to L639 is really takeDisj with some smart selection of the branches. takeDisj(... , opboth, opleft, opright) and put the takeDisj close to the other, just so that the actual lattic logic is not so deeply burried inside this huge block | |
746 | At first reading, it was surprising to just see "UnaryOp" here, since that looks like just another arith:: version. Perhaps we should rename the Unary/BinaryOp of sparse tensor dialect a bit more specific (can be done later). | |
845–846 | what does it mean if we hit this part? an empty block? or an error? |
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp | ||
---|---|---|
797–800 | This is related to my handling of binary[overlap] and unary[present] when those regions are empty. If you look at Merger.cpp:617, it passes a nullptr for the Operation*. That, in turn, manifests in line 886 and returns Value() for the Value output. When that empty Value reaches this code, we skip the insertion. It works, but it does feel like a hack. Can you think of a better way to indicate to the generating code that no output is needed? | |
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp | ||
625 | That seems reasonable. I will refactor and check the result of dyn_cast. If it fails, it means the BinaryOp has already been handled. buildLattices has unsigned return type, so what should I return in that case? This is where I want a no-op, meaning "don't add any new lattices points". | |
632 | Good idea to move this up near the other takeDisj(). | |
845–846 | This means an empty block, so I want to indicate no value. |
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp | ||
---|---|---|
797–800 | It was unexpected, but am okay with this if (1) it is documented and (2) perhaps we can even add some form of assert when !rhs that verifies that indeed we hit an expected case | |
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp | ||
625 | If at all possible, I would like the "nop" to be detected at a higher level, i.e. before building the new lattices. Otherwise we would have to somehow return the existing set id (returning a nop value is a bit too intrusive to my taste). But after you have done the restructuring suggested here, I will have another look. |
Updates based on feedback
- Create 2nd takeDisj method specifically for binary
- Eliminate kBinaryRegion/kUnaryRegion in favor of no "handled" cases
- Add better comments
- Eliminate the double handling of kBinary by looking at the loop level
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp | ||
---|---|---|
1223 | clang-format made this change; not sure why | |
1370 | same as above -- clang-format wanted this changed | |
1603 | This is new and is my attempt to avoid calling kBinary twice so we avoid the need for a "no-op" return. | |
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp | ||
69 | This is a weaker assert, but is required because splitting up the binary result in some pieces which are fundamentally unary, so they don't have a y. In this iteration, I am still labeling all the split up chunks of binary as kBinary. | |
622 | I don't know if this is the correct approach, but it makes the matrix tests all pass. We only split up the binary operation once and we never need to return a no-op. At least, that will hold as long as z==1 for the last time we try to buildLattices() on the binary operation. | |
650 | I created the new takeDisj, which is very nice. It does mean that all of the split up pieces of kBinary remain labeled as kBinary, even for left and right which only have a single input argument. | |
876 | These can be combined in the switch because their logic is identical. The binary operation splits up into pieces which may have 1 or 2 input arguments, so they look just like the unary operation being split up. | |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_matrix_ops.mlir | ||
107 | This is now passing. |
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp | ||
---|---|---|
546 | I figured out how to make the absent region work. I essentially treat it like v + 1 (which is a binary function). Using the same logic, I perform a disjunction. The overlap will be replaced by the present block which only has 1 input argument, so it will sub in the left argument rather than being truly binary. | |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_vector_ops.mlir | ||
187 | This test is now passing. |
@aartbik I think this PR is ready for a full review. It's passing all the tests.
- I do need you to look at the if (z != 1) logic and see if that is reasonable. It works for the tests, but I would appreciate your perspective on where that might break.
- I also want to discuss the Kind that is assigned when unary and binary are broken up. I currently have the Kind staying unchanged, but that limits what can be asserted in TensorExps constructor. It also makes other checks strange -- for example, mapSet checks that we only pass in unary operations, but I also have to allow kBinary because some of the split up chunks require mapSet.
- You said you wanted to brainstorm how to pass the blocks around via the children rather than the merge operations directly. I'm open to that if we can figure out a way. Otherwise, I don't think the current implementation is bad.
Thanks for making such diligent progress with this and apologies for the delay. A few high prio tasks came up but from here on it should be a bit smoother sailing!
mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h | ||
---|---|---|
98 | Note that technically this field is only used for unary and binary, but putting this in a union or so seems too convoluted. But just add a comment on when this field is set. | |
158 | I am okay with this for now (and perhaps permanently). It at least makes semantics at top level explicit. | |
247 | "e" and "i" have some designated meaning, so please don't change into im. Here "z" can use a comment in the method description. | |
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp | ||
797–800 | It is a bit strange to refer to "Kinds" here, just rephrase the comment in general terms. | |
1603 | This feels hacky. In the original iteration space theory, building the lattice sets solely depends on the current index, and it has nothing to do with the nesting depth or anything like that. In principle, we need to make this call at every loop nest to determine the remaining expressions. | |
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp | ||
69 | This feels hacky. I found these asserts very useful to ensure we are truly doing the right thing. If we introduce all sorts of intermediate stages, as in binary but marked unary etc, such important invariants are lost. Can we cleanup the "top level" logic that transforms binary into unary (this is e.g. also done around binary minus into unary minus) | |
75 | assert on op at least | |
80 | && op | |
152 | period at end, here and below. | |
634 | period at end | |
746 | braces not needed | |
790 | braces not needed | |
880 | period at end | |
887 | period at end | |
890 | here and below, document this shortcut returns | |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_matrix_ops.mlir | ||
56 | Period at end | |
57 | Rather than adding and modifying an existing test, I would much rather that you add two new integration tests: unary and binary. Yes, it has some boiler plate to setup inputs, but that way it is more clear where the semi-ring stuff is tested. | |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_vector_ops.mlir | ||
93 | Same here. I would move these into the semi ring integration tests mentioned above, and simply add the novec/vec flags at the top (a test can have more than one RUN/CHECK series) |
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp | ||
---|---|---|
1603 | I agree. This was a quick and dirty approach, but I've convinced myself that it is not just hacky, but actually gives wrong results. |
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp | ||
---|---|---|
1603 | Looking forward to that. Thanks! |
Change how repeat calls to unary and binary are handled
- Remove z parameter from buildLattices
- Add mergeOp to TensorExp
- Move tests to their own files
Some things to notice with these changes:
- There are now two Operation * in TensorExp. This lets us keep the original binary or unary operation as well as the yield operation to be merged after splitting it up into pieces. By keeping the original, it allows us to perform the same logic in further nested loops. In some of the split up pieces (disjoint left or right regions), we do not include the original op. This essentially signals that these pieces have been "handled". Only one region will contain the original and will continue building in the next level down.
- There is one case where the Kind changes. kBinary keeps the conjoint piece as kBinary, but the disjoint pieces become kUnary. This lets us retain the check in mapSet that requires a unary kind. If you don't like this change of kind, I can change it. It only affects some of the asserts.
mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td | ||
---|---|---|
447 ↗ | (On Diff #424560) | Can you actually add a test of this, something like %0 = linalg.generic #trait_vec_op ins(%arga, %argb: tensor<?xi64, #SparseVector>, tensor<?xi64, #SparseVector>) outs(%xv: tensor<?xi64, #SparseVector>) { ^bb(%a: i64, %b: i64, %x: i64): %idx = linalg.index 0 : index %cst = arith.index_cast %idx : index to i64 %1 = sparse_tensor.binary %a, %b : i64, i64 to i64 overlap={ ^bb0(%a0: i64, %b0: i64): sparse_tensor.yield %cst : i64 } left={} right={} linalg.yield %1 : i64 } -> tensor<?xi64, #SparseVector> and then see if we indeed generate the right code? I am not 100% sure how the part outside the new op will interact with codegen |
mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h | ||
98 | Please add a bit more description for both, i.e. origOp points to original unary/unary op, mergeOp points to ... before going in detail of what needs to be there | |
100 | This is a better solution than the original nesting depth based one. Still, the danger of a struct is of course that it is easy to add members :-). And although I doubt this is on any memory profile yet, still a bit worrisome to grow the sizeof for all nodes now. So if we keep this, at least add a TODO on improving memory usage in the future. Furthermore, I am wondering if it perhaps would be better to reflect the state in the type (sort of what you had originally) and just store one op. All of this can be done later, just thinking out loud here. | |
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp | ||
797–800 | did you see these comments? | |
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp | ||
144 | For this first version, I actually prefer that we keep takeDisj intact (other than passing op) takeCombi method that implements your new logic, just so we don't need to touch other ops in this revision. | |
528 | splinter? | |
531 | I patched in your revision, but got compilation errors here. #include "mlir/Dialect/SparseTensor/IR/SparseTensor.h" | |
542 | period at end | |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_binary.mlir | ||
42 ↗ | (On Diff #424560) | Here, and probably for the unary too, scan we add a case where one of the operands is dense (either truly dense, or annotated with "dense"). Just to verify that a lattice with "universal" index is properly tested. |
208 ↗ | (On Diff #424560) | looks like these two are never release (thus will fail memsan) |
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_unary.mlir | ||
40 ↗ | (On Diff #424560) | You have a test with absent, and one with present. Like your %result = sparse_tensor.unary %a : f64 to i32 present={ ^bb0(%x: f64): %ret = arith.constant 1 : i32 sparse_tensor.yield %ret : i32 } absent={ %ret = arith.constant -1 : i32 sparse_tensor.yield %ret : i32 } example? In fact, even though you test the present case with a matrix, how about also doing that for a vector, just so that all cases are covered for a single loop example. That way, the code also acts like a nice illustration of the feature |
112 ↗ | (On Diff #424560) | indentation is off |
119 ↗ | (On Diff #424560) | do we want to do the same thing here? |
140 ↗ | (On Diff #424560) | This is never released (and will thus fail our memsan test). |
Change back to one Operation *
- Add new kBinaryHalf kind
- kUnary and kBinary hold their original operation until buildExp
- During buildExp, locate the primary YieldOp and merge that
- Add more tests
mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td | ||
---|---|---|
447 ↗ | (On Diff #424560) | This will not work. I get this error: /Users/jkitchen/Projects/HIVE/llvm-project/mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_binary.mlir:118:16: error: 'arith.index_cast' op operation destroyed but still has uses %cst = arith.index_cast %idx : index to i32 ^ /Users/jkitchen/Projects/HIVE/llvm-project/mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_binary.mlir:118:16: note: see current operation: %0 = "arith.index_cast"(<<NULL VALUE>>) : (<<NULL TYPE>>) -> i32 /Users/jkitchen/Projects/HIVE/llvm-project/mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_binary.mlir:113:10: note: - use: "sparse_tensor.lex_insert"(%2, %21, <<UNKNOWN SSA VALUE>>) : (tensor<?xi32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ], pointerBitWidth = 0, indexBitWidth = 0 }>>, memref<?xindex>, i32) -> () %0 = linalg.generic #trait_vec_op ^ LLVM ERROR: operation destroyed but still has uses The only way I have been able to use linalg.index is to pass its result into sparse_tensor.binary as one of the arguments, as seen on line 449. That seems to work fine. |
mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h | ||
100 | Alright, I settled on another new approach. We can make this work with only a single Operation * in the struct.
When buildLattices encounters kUnary or kBinary, it will build the lattice structure with potentially many regions (i.e. binary always calls takeDisj). The regions which can be considered fully handled will change to kUnaryHandled. The primary region of both unary and binary will remain as kUnary or kBinary and will hold on to the original operation (not the yield). This will let us continue to use the original at every level of the lattice. When we finally call buildExp, we can dig into the original unary and binary operations and pull out the primary region's YieldOp at that point. This approach also avoids confusion because kUnary and kBinary always refer to the correct sparse_tensor.unary and sparse_tensor.binary operations. Any disjoint offshoots use a different "Kind" to clarify that they are not the original operation anymore. I will update the PR with these changes. | |
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp | ||
797–800 | I am adding a check to ensure that the kind is either kUnary or kBinary. | |
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp | ||
531 | That should already be there. See line 11. |
Sorry my comments are out of order with the new PR. They should be read as if they happened before the new PR. I forgot to submit them until now.
A few last comments. Sorry for being nitpicky on this, rest assured I really like the work, I am just a bit particular on certain things.
Also, can you please make a pass over all my comments and mark them "Done" (or comment on them otherwise).
That makes the re-review a bit easier, since a lot of my past comments still show up as unresolv.ed
mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td | ||
---|---|---|
440 ↗ | (On Diff #425306) | See below, but in a nutshell, let's not rewrite the example just because the desired form does not work yet. |
447 ↗ | (On Diff #424560) | But by passing it as argument, we lose the intersection/union power of the binary op. (okay doing this later, but I would prefer that we show a binary op example in the doc that uses two sparse inputs *and* uses the index in one or more of the branches, and not rewrite the example to match the current implementation status) The problem is that we probably need some work dealing with the "outside" block computations that are only used in some places (and thus can be placed under the conditional). We can put a TODO if you prefer. |
mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h | ||
49 | Maybe add a comment, as in // semi-ring unary op | |
66 | maybe add same-line comment as in // semi-ring binary op | |
100 | Looks like you iterated over some naming, since I see kBinaryHalf now. | |
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp | ||
802 | if the if-part has braces, so does the else-part, even when it is a single line stmt | |
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp | ||
24 | : kind(k), val(v), op(operation) { and then nothing below, since these are not part of a union so can always be set Also, it feels weird to have a longer parameter name then field name, so perhaps call the parameter simply "o" | |
75 | can we assert on y in the new model? | |
137 | does this line apply to L144? If so, please please after that line | |
343 | please follow same order as in enum decl. | |
526 | just use cast (not dyn_cast) and no assert, since this should alway work | |
537 | same here and below, if the cast should work, just cast | |
806 | cast, we assume that verifier has filtered out bad cases | |
877 | The unary and binary codegen blocks are a bit too large for this context (most others are oneliners) |
mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td | ||
---|---|---|
447 ↗ | (On Diff #424560) | I would hope that your example test (and the documentation example I originally had) would work. If you are okay with a documentation example that gives an error, but is the desired end goal, I will revert to that. I am just sensitive to introducing a new feature with broken examples, giving people who want to try it out a bad impression of the work, unless you are going to figure out how to make it work before we merge. |
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp | ||
75 | We cannot. During the original call, y==-1. For this "binary" case handling the absent region, when buildExp is called, we only utilize v0 and completely ignore v1, even though it exists from a lattice perspective. Would it be helpful to add a comment here about why we can't assert anything about y? |
Updates based on feedback
- Change kBinaryHalf to kBinaryBranch
- Revert binaryop doc example
- Remove dynamic casting
mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td | ||
---|---|---|
447 ↗ | (On Diff #424560) | I am okay with that, but please add a small note on that in the doc (this is intended future behavior but still under construction ;-) |
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp | ||
75 | Yeah, please document, so that in the future I do not try to put the assert back ;-) |
A few last comments, but I am giving you the LGTM, since I think this is ready to go in, so we can work on refining the few remaining issues.
Thanks for your patience during the review, and thanks for this contribution!
mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td | ||
---|---|---|
447 ↗ | (On Diff #424560) | I don't see the note in the doc (I really meant, the user facing example, so that people trying this out are aware it does not work yet). So something like Example of A+B in upper triangle, A-B in lower triangle (not working yet, but construct will be available soon). |
mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp | ||
799 | mark these new helpers as "static" since they are private to the file |
@aartbik Thanks for reviewing and helping me converge on a good solution for lowering.
Once the build passes, I will merge it.
Maybe add a comment, as in
// semi-ring unary op