This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] Implement custom reduction
ClosedPublic

Authored by jim22k on Jul 26 2022, 7:49 AM.

Details

Summary
  • Adds logic within linalg.generic lowering to handle

sparse_tensor.reduce. The identity argument is used
to initialize the loop for lex insert, or as the starting
value during expanded access pattern.

  • Adds default identity values for standard reduction operators.
  • Tests verify correct behavior.

Diff Detail

Event Timeline

jim22k created this revision.Jul 26 2022, 7:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2022, 7:49 AM
jim22k requested review of this revision.Jul 26 2022, 7:49 AM

@aartbik This is mostly the same code as before the split. I moved getReductionIdentity out of Merger.cpp and it's now fully contained in codegen. I also broke up genInsertionLoad into two functions -- one to handle reductions and one to handle non-reduction cases. This eliminates the duplicate if (codegen.redKind == kNoReduc) checks.

I'm still not sure what all needs to happen for vectorized reductions.

Sorry for the delay getting back to you on this, I was OOO for 2 weeks. I had a look, and the amount of changes in the merger seem about right. The changes in sparsification, however, feel too elaborate. I will hack a bit around tomorrow with this and let you know if there are simpler solutions, and also get back on the vectorization question. In the meantime, the changes to sparse_out_reduction.mlir seem self-contained, so that could be a quick revision first, once you addressed the memsan issue.

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
816

Note that reductions should be fully "scalarized", so I never expect tensor loads having to deal with reductions

mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp
1041

why is this commented out? do we need some verification on the value of the third parameter?

1043

This probably should become getRegion() when you rebase with some recent changes in MLIR core

mlir/test/Dialect/SparseTensor/roundtrip.mlir
292 ↗(On Diff #447687)

change seems unrelated to this revision

mlir/test/Dialect/SparseTensor/sparse_kernels.mlir
58 ↗(On Diff #447687)

why do we have changes in the original reductions?

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_out_reduction.mlir
1 ↗(On Diff #447687)

The changes in this file merely seem to increase coverage of existing reductions, but they could be done in their own revision, since it is unrelated to this change.

15 ↗(On Diff #447687)

note that this is a DCSC if you want to use the standard naming (you can rename the one above to DCSR then to be consistent)

78 ↗(On Diff #447687)

maybe redprod1 and redprod2, or otherwise redprodDCSR and redprocDCSC?

165 ↗(On Diff #447687)

you will need to release smr and smc as well!

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom.mlir
51

although I of course applaud composing the new feature with other features in our tests, I would expect that for simplicity you also have one integration test that just introduces the new sparse_tensor.reduce, without using the binary or unary ops also

mlir/unittests/Dialect/SparseTensor/MergerTest.cpp
264

Should this not be in the binary section?!

I think the confusion is because some other tags are misplaced here ;-)

aartbik added inline comments.Aug 9 2022, 5:16 PM
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
1053

I think you will only need to change the code slightly in this file at *two* places.

(1) At this point (start of a scalarized reduction), when you are in a custom reduction, don't call genTensorLoad, but initialize the "load" Value with the identity value provided by the sparse_tensor.reduce third parameter.

(2) At genInsertionLoad, we need to deal with regular insertions and access pattern expansion for custom reductions. In the former, we load the identity rather than zero, and are done. In the latter case, we cannot simply load the zero from the expanded access pattern, but need to do

if (!expFilled[i])

value= identity

else

value = expValues[i]

You have some of that logic already, but it happens during a regular tensor load. I think you will just need to do it during genInsertionLoad

Will that provide the semantics you expect?

mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp
1042

redop -> redOp

1228

This seem small enough to just do

case kReduce: {

  ReduceOp redOp = cast<ReduceOp>(tensorExps[e].op);
  return insertYieldOp(rewriter, loc, redOp.getRegion(), {v0, v1});
}
aartbik added inline comments.Aug 10 2022, 1:40 PM
mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
122

Comment needs to be updated for kReduce

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
393

not needed, see below

423

not needed, see below

429

Note that you do not have to worry about vectorization, since the "block" of code in the semi-ring operation cannot be vectorized anyway. You will have to make sure, however, that vectorization is disabled in the presence of a custom reduction.

756

I would simplify this into

return builder.create<arith::SelectOp>(loc, isFilled, valAtIndex, identity);

816

Let me refine my comment, I would never expect this to happen in pure tensor load, but in an insertion load only. So although I would have done it inside that method, this makes sense, except that I would just test for custom reduction, not all the others, and proceed with the code you had just for custom reduction.

You will have to check if we are inside a custom reduction more widely, since redKind is not necessarily set (when we are in an insertion loop). Having a better detection here, for example by setting a customRed value during the recursion, will avoid the huge block of code later! All logic will be here.

Also, this avoids the changes to existing reductions you had.

988

this block of code is not needed when the custom reduction is properly detected during insertion load, see my code suggestion above

1054

So, this branch would become, where getCustomRedId fetches the third argument from a custom reduction operation.

if (atStart) {
  Kind kind = merger.exp(last).kind;
  Value load = kind == Kind::kReduce
      ? getCustomRedId(merger.exp(last).op)
      : genTensorLoad(merger, codegen, builder, op, exp);
  codegen.redKind = getReduction(kind);
  codegen.redExp = exp;
  updateReduc(merger, codegen, load, "START");
} else {
  ....
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom.mlir
24

unrelated to this change, but I actually noticed that when we use the custom reduction, we don't seem to verify the iterator types as carefully as otherwise is the case. Did you notice that too. If so, we probably need to fix that in the verifier (hopefully we don't need to go into linalg for that).

152

This is actually WAI, sort of. When we do a reduction within the innermost loop, we always come out with a value, even if the loop is actually never taken. This behavior also happens for regular reductions, and is similar to what the TACO compiler generates (at least at the moment). Agreed, we could improve this, but that will take some more work beyond this new operation.

174

Note that recent changes unified this with the bufferization dialect ops:

bufferization.dealloc_tensor

@aartbik Thank you taking a look at my code and giving me feedback.

I do have a question about the approach you are suggesting. If I check for custom reduction and only use the identity in that case, it will leave the current behavior of standard reductions unchanged, including the behavior that reduction with mulf always returns 0.0. Is that the direction you want to go? My PR tried to address that, but I'm okay leaving it alone and only focusing on custom reduction if you feel that is better.

mlir/test/Dialect/SparseTensor/sparse_kernels.mlir
58 ↗(On Diff #447687)

This PR doesn't only implement custom reductions. It also adds the getReductionIdentity which applies to standard reductions like mulf. Without those changes, reduction using mulf will always return 0.0.

If you want to leave the existing behavior of standard reductions unchanged, that is fine with me. It will then require a custom reduction to use mulfwith a specified identity of 1.0.

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_out_reduction.mlir
1 ↗(On Diff #447687)

The tests I added here are meant to add coverage of existing (non-custom) reductions, specifically mulf, which I consider as being in a broken state. That is why I included them here.

Per my note above, I'm okay if we want to leave existing behavior alone. In that case, I would remove these tests.

I propose that we leave current behavior unchanged (although we should fix both issues eventually, i.e. mul-reduction with 1 as well as innermost loop skipped yields no value) and focus on adding new op only.
After that we can address the two issues pointed out.

Sounds good?

jim22k updated this revision to Diff 452440.Aug 13 2022, 11:23 AM
jim22k marked 20 inline comments as done.
  • Updates based on feedback

@aartbik This is definitely simpler than what I had in Sparsification.cpp. Thanks for pointing me in the right direction.

Almost there! I believe we can make it even simpler with some recursive set/unset.
But then good to go

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
818

since the if returns, no else, and indent the second return to the left

983

I believe that this function can be simpler too. Simply remove the "last" and all codegen code you added
Then, as else branch of L985 use

if (merger.exp(exp).kind == Kind::kReduce)

codegen.redCustom = getCustomRedId(merger.exp(exp).op);

and then right before exit at L1291 do

if (merger.exp(exp).kind == Kind::kReduce)

codegen.redCustom = Value()

you can also make redCustom an unsigned and use -1/exp value instead.
This way, the recursive nature will take care of handling the tensor load in the code above, without special handling

Of course, this assume custom reductions are never nested, something that makes sense, but perhaps we should verify

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom.mlir
214

make this more precise, it it not about the Inf per se, but about not having full "fill in" due to the reduction loop

jim22k updated this revision to Diff 453108.Aug 16 2022, 1:15 PM
jim22k marked 2 inline comments as done.

Further refinement based on feedback

jim22k marked an inline comment as done.Aug 16 2022, 1:17 PM

I hope this is what you had in mind. I didn't see how to make redCustom unsigned, as it needs to store a Value.

I hope this is what you had in mind. I didn't see how to make redCustom unsigned, as it needs to store a Value.

Oh, I just meant to store the exp index, and then later extract the value using that; but this is fine too

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
986

we can actually assert here to avoid the nested problem

assert(!codegen.redCustom);
codegen.redCustom = ....

aartbik accepted this revision.Aug 16 2022, 4:22 PM

Assuming you address my last few nits (can't help myself), this is good to ship!
Thanks for your patience during this review.

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
984

if you move this below the next few if-s that return

you can do this without the else, and without braces

if ...

return

if (merger.exp(exp).kind == Kind::kReduce) {

codegen.redCustom =

t=

This revision is now accepted and ready to land.Aug 16 2022, 4:22 PM
jim22k updated this revision to Diff 453173.Aug 16 2022, 5:26 PM
jim22k marked 2 inline comments as done.
Final updates
This revision was automatically updated to reflect the committed changes.