This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] Add rewrite rule for the sort operator.
ClosedPublic

Authored by bixia on Sep 25 2022, 11:27 PM.

Details

Summary

Add sparse-buffer-rewrite pass to rewrite sparse primitives on buffers to MLIR
implementation.

Add sparse rewrite rule for the sort operator.

Add FileCheck test and integration test.

Diff Detail

Event Timeline

bixia created this revision.Sep 25 2022, 11:27 PM
Herald added a project: Restricted Project. · View Herald Transcript
bixia requested review of this revision.Sep 25 2022, 11:27 PM
Peiming added inline comments.Sep 26 2022, 9:10 AM
mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
328 ↗(On Diff #462817)

Why this (and the above one) is not static? is it used outside this file?

419 ↗(On Diff #462817)

Will this tailing recursion be optimized (by any pass in MLIR)?

573 ↗(On Diff #462817)

Do you think it is a good idea to decouple SortOp from these functions? i.e., they instead take two ValueRange, one for sorting and one for parallel arrays (It will probably make Aart's life easier for compress operator later).

You can also make the compare function as an extra callback argument to the function, to make it extensible for other orders.

We can also move these function into CodegenUtils.cpp maybe?

Peiming added inline comments.Sep 26 2022, 9:30 AM
mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
382–383 ↗(On Diff #462817)

Just out of curiosity, any reason why you chose to generate a function here? (you can do it fully inline, right?)

bixia updated this revision to Diff 462981.Sep 26 2022, 11:47 AM
bixia marked 2 inline comments as done.

Decouple SortOp from the utility routines that generate sorting code.
Add missing static keyword to a routine.

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
328 ↗(On Diff #462817)

Was a mistake.

573 ↗(On Diff #462817)

Remove SortOp from those routine. Per offline discussion, we will try to progressively lowering and avoid the need to generate sort code without going through sort op.

aartbik added inline comments.Sep 26 2022, 1:07 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
1031 ↗(On Diff #462981)

As discussed offline, let's put this in a rewriter pass (that runs after codegen).
So we will

rewriting: pre-rewriting

sparsification

conversion: "codegen", which can introduce sort and push_back etc.

rewriting: post-rewriting (which deals with sort, push_back, foreach? etc)

wrengr added inline comments.Sep 26 2022, 3:12 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
18 ↗(On Diff #462981)

Usually we put angle-bracket includes after all the double-quote includes

321–324 ↗(On Diff #462981)

This is already available as drop_front (inherited from llvm::detail::indexed_accessor_range_base)

Since this is a sizable chunk of code that's rather independent of the rest of the codegen, I think it should be moved off to a separate file. (Anticipating there being a number of other similar large chunks of codegen code, I'd suggest adding a subdirectory and calling it lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen/Sort.cpp)

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
341 ↗(On Diff #462981)

I think this needs a better name. When reading the definition itself it looks fine, but later on when it's called I got a bit confused because the name suggests (to me) that it's returning the main "sort" function itself, rather than being reused to do the name mangling for any helper function associated with the main sort function.

If you move all the sorting codegen off to its own file, then we could name this getMangledFunc (or getTypeMangledFunc, getFuncInstanceForType, etc). Whereas if you keep it in this file, then it'd need something more explicit like getMangledSortingHelperFunc to keep it clear that it's only for sorting helper functions.

345 ↗(On Diff #462981)

I think the namePrefix should be moved earlier in the arguments. To closely match the argument ordering used elsewhere in MLIR it should be (builder, insertPoint, resultTypes, namePrefix, dim, operands, createFunc); that is, so it matches the (returnType, name, operands) ordering used elsewhere. But regardless of the ordering of the other arguments, the point is to make sure the createFunc argument comes last, since doing so allows for nicer code formatting when the createFunc is a lambda (cf., the "To take best advantage of this formatting" paragraph at https://llvm.org/docs/CodingStandards.html#format-lambdas-like-blocks-of-code )

373–374 ↗(On Diff #462981)

I'm curious why you have the i != j conditional as part of the generated function, rather than having it at the callsites?

421 ↗(On Diff #462981)

This should be spelled "than", not "then". Ditto for createLessThanFunc, etc.

522 ↗(On Diff #462981)

ditto: "lessThan"

524 ↗(On Diff #462981)

ditto: "less_than"

419 ↗(On Diff #462817)

Also, I think it would probably be better to convert this recursion into a for-loop instead. That way we can leave it up to downstream lowerings to decide whether and how much to unroll the loop. (Or this pass could be given a configuration setting to allow the client to decide whether they want the loop unrolled or not.)

bixia updated this revision to Diff 463254.Sep 27 2022, 9:12 AM
bixia marked 7 inline comments as done.

Move the implementation to sparse-buffer-rewrite.

bixia added a comment.Sep 27 2022, 9:13 AM

Since this is a sizable chunk of code that's rather independent of the rest of the codegen, I think it should be moved off to a separate file. (Anticipating there being a number of other similar large chunks of codegen code, I'd suggest adding a subdirectory and calling it lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen/Sort.cpp)

This is moved to file for rewriting primitives that use buffers now.

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
341 ↗(On Diff #462981)

Change to use getMangledSortHelperFunc.

345 ↗(On Diff #462981)

Use this order (builder, insertPoint, resultTypes, namePrefix, dim, operands, createFunc)

373–374 ↗(On Diff #462981)

Rename this to CreateMaySwapeFunc to clarify the purpose of the function and to avoid the confusion hopefully.
It has two call sites, and can share the code to create and conditional block.

419 ↗(On Diff #462817)

This part doesn't generate recursive calls in MLIR, it is only the compile codegen algorithm itself is recursive.
But I rewrite the compiler codegen into a loop anyway.

bixia retitled this revision from [mlir][sparse] Add codegen rule for the sort operator. to [mlir][sparse] Add rewrite rule for the sort operator..Sep 27 2022, 9:13 AM
bixia edited the summary of this revision. (Show Details)
bixia updated this revision to Diff 463337.Sep 27 2022, 2:28 PM

Rebase.

wrengr added inline comments.Sep 27 2022, 4:10 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
419 ↗(On Diff #462817)

Yeah, I know the generated MLIR wasn't making recursive function calls. (If it were then lower-level passes would have a much harder time determining if it's safe to unroll.) And I'm totally fine with the C++ function being recursive (since the recursion has bounded depth).

Rather, my concern was about the MLIR-code bloat of always unrolling the loop. I'm not convinced there's always a performance benefit to unrolling the loop, hence my suggestion to generate an MLIR-loop and leave it to subsequent passes to decide whether and how much to unroll it (or vectorize it, etc).

bixia added inline comments.Sep 27 2022, 4:23 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
419 ↗(On Diff #462817)

This is NOT an unrolled loop in MLIR. The generated code compares two tuples of len(xs) values. It is not a loop because the two tuples are NOT stored in two memref that we can loop through.

wrengr added inline comments.Sep 27 2022, 5:08 PM
mlir/lib/Dialect/SparseTensor/Pipelines/SparseTensorPipelines.cpp
67

Does this need to be a global pass, or can you use addNestedPass<FuncOp> instead? (It's unclear to me if that's valid for the way getMangledSortHelperFunc is defined)

mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp
93

iirc, you can just omit the variable name when it's unused. Though I don't recall if the MLIR style-guide has a stance on the syntax to use here

110

Why not use for (auto arg : args.drop_front(xStartIdx))?

127–128

why call these vi/vj instead of xi/xj?

170

The indentation here doesn't match the preceding if/else if

171

should be x1

171–174

As I mentioned on the comment thread for the previous version, I don't mind the C++ function being recursive. If you're going to unroll the MLIR-loop anyways, then I think the recursive C++-function was cleaner/easier to read. But since you've already made the change, it's up to you which implementation to go with.

FWIW, I only just noticed that the loop is over a ValueRange rather than over a memref or some other runtime collection. So there isn't any clean way to convert this into an MLIR-loop like I'd been suggesting. Since the number of dimensions is generally going to be small, the unrolled loop is surely better than allocating and initializing a collection just to loop over it. Though if do we notice code bloat becoming an issue, then at that point in time we could always switch over to allocating such a collection (since the same collection would be reused for all the different loops throughout the sorting algorithm, so the cost of constructing it can be amortized).

242

why not boolType or i1Type?

283–292

I haven't checked, but is the current implementation a stable-sort? If so, then should add that to the documentation. If not, then should update the partitioning to make it stable (and then document that fact).

350–361

I think I missed it in the chat but, what's the reason for wanting to cast things to dynamic shapes?

bixia updated this revision to Diff 463572.Sep 28 2022, 8:33 AM
bixia marked 3 inline comments as done and an inline comment as not done.

Address review comments.

mlir/lib/Dialect/SparseTensor/Pipelines/SparseTensorPipelines.cpp
67

Per offline discussion, need to be a global passs for adding new funcs.

mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp
93

I thought about this, but found an MLIR example

127–128

We use x0, x1, to refer to the indices for dim0, dim1. So, I call them valuei, value j and vi, vj for short. I use the same naming in the integration test.

171–174

Acknowledge.

283–292

We can't make quick sort stable without using extra storage.
Per offline discussion, we will add an attribute for requesting a stable sort and implement another algorithm for that.

350–361

Reusing the same MLIR routine for different static shapes with the same element types.

This is looking pretty good! A few minor comments and suggestions

mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp
54

can we make this a typedef, so that createFunc can appear at the same line in the argument list

85

Rather than generating a function call for swapping, have you tried the parallel assignment ("tuples" in and out) for swapping and generate it inline?
I have not checked this fully, but hopefully that generates more efficient code.

if (i != j)

x0[i], x1[i], ... = x0[j], x1[j], ....
93

I don't think that is the right example (the "unusedDims" are used to indicate the ones that are not used, right?).
But there was some discussion on this a while back, with most people in favor of keeping the var names in the definition (omitting in declaration is accepted)

mlir/test/Dialect/SparseTensor/buffer_rewriting.mlir
7

do you want to add some CHECKing of the generated code?

I realize that being too pattern specific may be brittle, but you could perhaps test for some basic structure?

bixia updated this revision to Diff 463659.Sep 28 2022, 12:59 PM
bixia marked an inline comment as done.

Address review comments.

mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp
85

parallel assignment is a C like syntax. In MLIR, we need to load memref and store memref. That is what the code generates currently.

93

Thanks! Keep the name then.

bixia marked an inline comment as done and an inline comment as not done.Sep 28 2022, 1:01 PM
bixia marked an inline comment as done.Sep 29 2022, 9:41 AM
aartbik accepted this revision.Sep 29 2022, 11:08 AM
aartbik added inline comments.
mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp
85

I was thinking of another compiler with the parallel assignment ;-)
But my major point of course was inlining the swaps vs. introducing a new function.

But we can always revisit that based on on perf analysis.

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp
162

This should be in already (rebase with main), also comment on L160 changed ;-)

This revision is now accepted and ready to land.Sep 29 2022, 11:08 AM
This revision was automatically updated to reflect the committed changes.