This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse][gpu] recognizing sddmm pattern in GPU libgen path
ClosedPublic

Authored by K-Wu on May 26 2023, 1:05 PM.

Diff Detail

Event Timeline

K-Wu created this revision.May 26 2023, 1:05 PM
Herald added a project: Restricted Project. · View Herald Transcript
K-Wu updated this revision to Diff 526295.May 27 2023, 4:02 PM

rebase origin/main with sddmm op added

K-Wu updated this revision to Diff 526300.May 27 2023, 6:17 PM

clang-format

K-Wu updated this revision to Diff 528036.Jun 2 2023, 4:27 PM

rebase origin/main

K-Wu updated this revision to Diff 528041.Jun 2 2023, 5:09 PM

add test

K-Wu updated this revision to Diff 528042.Jun 2 2023, 5:10 PM

involving the test

K-Wu added inline comments.Jun 2 2023, 5:24 PM
mlir/test/Dialect/SparseTensor/GPU/gpu_sampled_matmul_lib.mlir
88

can we instead output #CSR type here?

K-Wu updated this revision to Diff 528046.Jun 2 2023, 5:30 PM

incorporating Aart's test case

K-Wu updated this revision to Diff 528053.Jun 2 2023, 5:58 PM

add an in-place (from above, i.e., the program input) op test

K-Wu updated this revision to Diff 528054.Jun 2 2023, 6:01 PM

format

K-Wu updated this revision to Diff 528474.Jun 5 2023, 9:13 AM

saving progress

K-Wu updated this revision to Diff 528902.Jun 6 2023, 9:15 AM

fixing compile error; remove repeated test case

K-Wu updated this revision to Diff 529431.Jun 7 2023, 2:00 PM

adding more tests

K-Wu updated this revision to Diff 529478.Jun 7 2023, 5:22 PM

update comment

K-Wu updated this revision to Diff 529479.Jun 7 2023, 5:22 PM

rebase origin/main

K-Wu updated this revision to Diff 529480.Jun 7 2023, 5:28 PM

drafting rewrite rule

K-Wu updated this revision to Diff 529498.Jun 7 2023, 9:34 PM

saving progress

K-Wu updated this revision to Diff 529662.Jun 8 2023, 10:19 AM

add rules and update tests

K-Wu updated this revision to Diff 529677.Jun 8 2023, 10:49 AM

fix filechck parsing error

K-Wu added inline comments.Jun 8 2023, 10:51 AM
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
879

rewrite rule here

K-Wu added inline comments.Jun 8 2023, 10:53 AM
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir
42 ↗(On Diff #529677)

TODO: add checking logic

K-Wu added inline comments.Jun 8 2023, 11:20 AM
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir
42 ↗(On Diff #529677)

Didn't notice there is already checking logic down below. Closing this

K-Wu added inline comments.Jun 8 2023, 2:29 PM
mlir/test/Dialect/SparseTensor/GPU/gpu_sampled_matmul_lib.mlir
98

%1 = sparse_tensor_in_place_result %args: tensor<8x8xf64, #SM> // something new, I have to think about it

%2 = linalg.generic #trait_sampled_dense_dense
  ins(%args, %arga, %argb: tensor<8x8xf64, #SM>,
                           tensor<8x8xf64>, tensor<8x8xf64>)
  outs(%1: tensor<8x8xf64, #SM>) {
    ^bb(%s: f64, %a: f64, %b: f64, %x: f64):
      %p = arith.mulf %a, %b : f64
      %q = arith.mulf %s, %p : f64
      %r = arith.addf %x, %q : f64
      linalg.yield %r : f64
} -> tensor<8x8xf64, #SM>
return %2 : tensor<8x8xf64, #SM>
K-Wu updated this revision to Diff 529771.Jun 8 2023, 4:24 PM

fix integration test

K-Wu published this revision for review.Jun 8 2023, 5:33 PM
K-Wu updated this revision to Diff 530007.Jun 9 2023, 10:12 AM

rebase; clean up a bit rule

K-Wu updated this revision to Diff 530010.Jun 9 2023, 10:17 AM

reformat

aartbik added inline comments.Jun 12 2023, 4:15 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
623

we test that C is sparse, but no test at all for D

So, the code below will sometimes crash, since it uses information on C not D
(e.g. isCOO)

628

this comment says more than is done below, is this a note to self, or something else?

632

you miss the d : row here

644

This assumes that D has exactly the same pattern already as C (or more precisely, the right length for all the arrays into which we copy the result). Perhaps we should start with simply returning C again after this operation, and find a way to allow this at IR level.

723

This is the wrong replacement, we need

rewriter.replaceOpWithNewOp<sparse_tensor::LoadOp>(op, ....)

and probably without inserts.

mlir/test/Dialect/SparseTensor/GPU/gpu_sampled_matmul_lib.mlir
111

can you remove this part with reuse completely, I think we will just need the pattern above for proper testing

mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir
46 ↗(On Diff #530010)

dense output, does not go through the GPU kernel then, right?

K-Wu updated this revision to Diff 530714.Jun 12 2023, 4:53 PM

upd test cases to reflect actual need though not passing yet

K-Wu updated this revision to Diff 530744.Jun 12 2023, 6:35 PM

incorporating aart's ad hoc solution

aartbik added inline comments.Jun 12 2023, 6:54 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
617

please fix 80-col breaks here in comment

619

I think we need a

if (op.getDpsInitOperand(0) != c)
return failure();

here to enforce the ad-hoc pattern

mlir/test/Dialect/SparseTensor/GPU/gpu_sampled_matmul_lib.mlir
105

perhaps add

TODO: find a cleaner way to express this

mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir
47 ↗(On Diff #530744)

Put the comment about the ad-hoc idiom here too, so we don't forget.
Also the TODO.

92 ↗(On Diff #530744)

Just curious if you really want to use the from-file set up of the sampling matrix? We can also just create this in memory like so many other examples, right?

aartbik added inline comments.Jun 12 2023, 8:36 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
619

Or probably more precisely after preproc

%0 = bufferization.alloc_tensor() copy(%arg0)

op.getDpsInitOperand(0).getDefiningOp

links to

%0 = bufferization.alloc_tensor() copy(%c)

723

Make sure to return

sparse_tensor.load %arg0

instead of just %arg0, since the rematerialization needs to be made explicit.
You will find an example of this in e.g. the method sparse_simply_dynamic1() in the sparse_out.mlir test

K-Wu marked 6 inline comments as done.Jun 12 2023, 8:44 PM
K-Wu updated this revision to Diff 530992.Jun 13 2023, 11:04 AM

still fixing

K-Wu updated this revision to Diff 530997.Jun 13 2023, 11:11 AM
K-Wu marked 4 inline comments as done.

addressing comments

K-Wu marked an inline comment as done.Jun 13 2023, 11:11 AM
K-Wu updated this revision to Diff 531044.Jun 13 2023, 1:19 PM

correcting the sddmm rewriter and tests except for the recognition rule

K-Wu updated this revision to Diff 531045.Jun 13 2023, 1:21 PM

addressing comment

K-Wu marked 2 inline comments as done.Jun 13 2023, 1:21 PM
K-Wu updated this revision to Diff 531515.Jun 14 2023, 2:21 PM

incorporating the new pattern

K-Wu updated this revision to Diff 531523.Jun 14 2023, 2:40 PM

passed tests it seems

K-Wu added inline comments.Jun 14 2023, 2:45 PM
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir
92 ↗(On Diff #530744)

Okay I will incorporate it next time

Peiming added inline comments.Jun 14 2023, 2:54 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
334–351

The nest is too deep here. Could you either try to break it into more functions or early returns on some simple conditions?

e.g.,

if (!isa_and_nonnull(def)) 
   return false;

Value s_out = xxx
xxx
aartbik added inline comments.Jun 14 2023, 4:03 PM
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir
47 ↗(On Diff #530744)

We can now use https://reviews.llvm.org/D152969
and remove all comments about hacks, since this is proper in-place
definition that matches cuSPARSE precisely.

aartbik added inline comments.Jun 14 2023, 4:06 PM
mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
274 ↗(On Diff #531523)

actually with the new formulation, you no longer need this since this is proper

S(I,J) = SPY( S(I,J) ) x SUM_K A(I,K) B(K,J) * alpha + beta * S(I,J)

K-Wu marked an inline comment as done.Jun 14 2023, 4:26 PM
K-Wu updated this revision to Diff 531567.Jun 14 2023, 4:26 PM

removing obsolete comments

K-Wu marked an inline comment as done.Jun 14 2023, 5:00 PM
K-Wu added inline comments.
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
334–351

Good point and example! Incorporated

K-Wu marked 2 inline comments as done.Jun 14 2023, 5:25 PM
K-Wu updated this revision to Diff 531581.Jun 14 2023, 5:26 PM

incorporating comments

Peiming added inline comments.Jun 14 2023, 7:19 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
335

You need to do a null check too

K-Wu updated this revision to Diff 531602.Jun 14 2023, 9:11 PM

addressing comment

K-Wu updated this revision to Diff 531603.Jun 14 2023, 9:12 PM

rebase origin/main

K-Wu marked an inline comment as done.Jun 15 2023, 1:23 AM
aartbik added inline comments.Jun 15 2023, 10:55 AM
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
332

Add comment

// Helper to detect .....

344

return s_out == use->getOperand(0); ?

348

This seems way too simple. You pretty much match any

%1 = sparse_tensor.unary
%2 = sparse_tensor.reduce

but you don't check that %1 produces A * B and %2 produces sum accumulation

It will match the current example, but also much more! Please fix

641

remove this TODO. The current idiom has in-place semantics that will be correctly dealt with by bufferization

mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir
115 ↗(On Diff #531603)

releasing %0 feels cleaner, so that we have the in-place behavior clear

K-Wu marked 4 inline comments as done.Jun 15 2023, 11:35 AM
K-Wu updated this revision to Diff 531851.Jun 15 2023, 11:36 AM

addressing comments

K-Wu updated this revision to Diff 531860.Jun 15 2023, 12:03 PM

fixing compile errors

aartbik added inline comments.Jun 15 2023, 12:08 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
335

You will often find the idiom

if (Operation *def = ....getDefiningOp()) {

use def

}

for this purpose, which you can even combine with the isa

if (Operation *def getDefiningOp<...>

353

technically, you have not checked if unary feeds into reduce, right?
or did I miss that?

K-Wu updated this revision to Diff 531869.Jun 15 2023, 12:14 PM

fix errors

K-Wu updated this revision to Diff 531896.Jun 15 2023, 1:52 PM
K-Wu marked 2 inline comments as done.

fixing runtime error

K-Wu updated this revision to Diff 531897.Jun 15 2023, 1:57 PM

fix format

K-Wu marked an inline comment as done.Jun 15 2023, 2:14 PM
K-Wu added inline comments.
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
344

This is checked two times in the loop so I don't find an easy way to simplify the logic

K-Wu marked an inline comment as done.Jun 15 2023, 2:55 PM
aartbik accepted this revision.Jun 15 2023, 4:24 PM
This revision is now accepted and ready to land.Jun 15 2023, 4:24 PM