This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse][gpu] add spgemm operator
ClosedPublic

Authored by K-Wu on Jun 14 2023, 4:54 PM.

Diff Detail

Event Timeline

K-Wu created this revision.Jun 14 2023, 4:54 PM
K-Wu updated this revision to Diff 531847.Jun 15 2023, 11:24 AM

rebase origin/main

K-Wu updated this revision to Diff 532682.Jun 19 2023, 8:57 AM

stashing changes

K-Wu updated this revision to Diff 536358.Jun 30 2023, 12:11 PM

rebase origin/main

K-Wu updated this revision to Diff 536393.Jun 30 2023, 1:22 PM

rebase origin/main

K-Wu updated this revision to Diff 537500.Jul 5 2023, 2:25 PM

nothing should be changed

K-Wu updated this revision to Diff 541779.Jul 18 2023, 4:35 PM

uploading changes

K-Wu updated this revision to Diff 541804.Jul 18 2023, 6:28 PM

adding ops

K-Wu updated this revision to Diff 542181.Jul 19 2023, 1:57 PM

added GPU rewriter pass

K-Wu updated this revision to Diff 542266.Jul 19 2023, 5:56 PM

finish all gpu rewriters

K-Wu updated this revision to Diff 542311.Jul 19 2023, 10:53 PM

add test

K-Wu updated this revision to Diff 542602.Jul 20 2023, 11:10 AM

upd tests

K-Wu published this revision for review.Jul 20 2023, 11:16 AM
K-Wu added reviewers: Peiming, bixia, wrengr, yinying-lisa-li.
Peiming added inline comments.Jul 20 2023, 1:05 PM
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
798–808

I somehow feel that we have repeated this code so many times, do you think you can make it a macro instead?

K-Wu added inline comments.Jul 20 2023, 1:37 PM
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
798–808

Good point. I will think about it.

K-Wu added inline comments.Jul 20 2023, 5:39 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2130

TODO: remove

K-Wu added a comment.Jul 20 2023, 7:23 PM

addressed self-comment

wrengr added inline comments.Jul 21 2023, 3:20 PM
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
125–129

This code has been repeated a bunch of times now. You should define a new tblgen-class so that you can then simply say def GPU_SparseSpGEMMOpHandle : GPU_SparseHandle<SparseSpGEMMOpHandleType, "SpGEMM operation">; and analogously for GPU_SparseSpMatHandle and GPU_SparseDnTensorHandle

Even better, it'd be nice to hook that GPU_SparseHandle tblgen-class in with the SparseHandleKind enum (also moving that enum from C++ to tblgen), so that you can further simplify that to def GPU_SparseSpGEMMOpHandle : GPU_SparseHandle<SpGEMMOp>; where the "SpGEMM operation" string is given as the description/summary of the SpGEMMOp member of the tblgen-enum.

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1805–1807

does CUDA specify this discrepancy between the numeric value and the algorithm name? If it doesn't, then you may want to make the numeric values match the algorithm name for better clarity

wrengr added inline comments.Jul 21 2023, 3:20 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2133

This description needs to be updated to actually say something about what the op is/does. Ditto for all the other new ops

I feels like some parts are still missing here. Also, do we need the full fine-grained GPU dialect op <-> cusparse lib call correspondence
can we start by merging a few of the set-up lib calls into one gpu dialect op for now (similar to what we did for cusparselt 2:4)

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
125–129

+1 on at least the first improvement

(at first glance I was a bit surprised to see a handle for an operation and not the data, but after reading GEMM in detail, I get it ;-)

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1805–1807

Yeah, it even seems stranger. But perhaps we can keep the 3,4,5 directly here?

typedef enum {

CUSPARSE_SPGEMM_DEFAULT                 = 0,
CUSPARSE_SPGEMM_CSR_ALG_DETERMINITIC    = 1,
CUSPARSE_SPGEMM_CSR_ALG_NONDETERMINITIC = 2,
CUSPARSE_SPGEMM_ALG1                    = 3,
CUSPARSE_SPGEMM_ALG2                    = 4,
CUSPARSE_SPGEMM_ALG3                    = 5

} cusparseSpGEMMAlg_t;

1815

why?

doc says

Default algorithm. Currently, it is CUSPARSE_SPGEMM_ALG1.

2133

+1

2154

seems like you did not finish the description?

2159

missing argument, no return?

gpu.spgemm_destroy_descr %descriptor

2229

missing?

2234

seems not right?

2347

this is the copy op?

2361

not complete?

2404

not copy?

2406

get size?

2440

empty line for space

K-Wu updated this revision to Diff 546193.Aug 1 2023, 1:06 PM
K-Wu marked 5 inline comments as done.

addressing comments

K-Wu added a comment.Aug 1 2023, 1:06 PM

addressing some comments

K-Wu updated this revision to Diff 546253.Aug 1 2023, 3:47 PM
K-Wu marked 11 inline comments as done.

addressing comments

K-Wu added a comment.Aug 1 2023, 3:52 PM

addressing comments

K-Wu updated this revision to Diff 546254.Aug 1 2023, 3:54 PM

better document

aartbik added inline comments.Aug 1 2023, 5:19 PM
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
130

"SpGEMM operation" -> "SpGEMM operation handle type"

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1815

doesn't this fit on one line now? or is this forced by lint?

2133

can we say a bit more? the doc now pretty much says what the name already implies.
perhaps say how such a descriptor is used subsequently

2184

could we combine this with spgemm_estimate_memory to reduce the number of ops
(i.e. if they always appear in sequence, we could keep the operation footprint a bit less compared to actual lib calls)?

2197

can we break this over multiple lines to stay in 80-col

2245

how does this perform the actual computation?

2304

wait, this is no longer used to *determine* buffer size, right?
you used use the buffer that was allocated

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
798–808

yeah, I am also thinking that keeping the definition and declaration together would already go a long way reducing boilerplate quite a bit
but, if we use a macro of the code here, then that would also be more readable and we can keep the code below

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
481

we repeat this TODO everywhere, shall we just mention it once in the alpha/beta macro

mlir/test/Conversion/GPUCommon/lower-sparse-to-gpu-runtime-calls.mlir
123

break lone line

128

and here

mlir/test/Dialect/GPU/sparse-roundtrip.mlir
95

break

100

break

wrengr added inline comments.Aug 2 2023, 12:42 PM
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
126

Fwiw, The thing I had in mind for this class would be to use description#" handle type" here, since that string suffix is repeated by all the instances of this class.

130

cf my comment on line 116

wrengr added inline comments.Aug 2 2023, 1:10 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1805–1807

Is there any reason not to also include DEFAULT? Especially if we do indeed make it the GPU_SpGEMMAlgAttr.defaultValue...

(I'd suggest also adding CSR_ALG_{,NON}DETERMINITIC, though that'd require some extra validation to make sure that the arguments are CSR whenever those algos are requested. Though I'd imagine that we already have similar requirements for Alg{1,2,3}...)

2184

And if you're worried that "even though they always go together now, perhaps we'll need to split them up later on", then that could always be done via giving the op a new argument enum Estimate { Work, Memory, Both }. ...That is, assuming the extra buffer arguments to the function calls can be easily hidden away as part of the op lowering.

Since our goal is to build a compiler rather than to provide MLIR bindings to the library per se, I think it's better to try to keep the ops as high-level as we actually want/need for the compiler, rather than going for 1-to-1 correspondence with the library function calls. Keeping things at the granularity of the compiler often helps both to improve codegen and to avoid compatibility issues with different library versions. Granted there's always a balancing act for this sort of thing, but nevertheless

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
311–317

I'm not sure what style is used in the rest of the file/dialect, but is it really necessary/helpful to provide the C/C++ types for each of the arguments, given as they have an easy 1-to-1 mapping to the LLVM types? Giving the names of the arguments makes sense, but the types less so imo

aartbik added inline comments.Aug 3 2023, 9:25 AM
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
130

yeah, that would work (the full description would then only be visible in doc page, and not inline in code, but I like that this automated way forces consistently)

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2184

+1 on the compiler vs 1:1 lib argument, and the proposal for the future

K-Wu updated this revision to Diff 546943.Aug 3 2023, 10:58 AM
K-Wu marked 11 inline comments as done.

addressing comments

K-Wu added inline comments.Aug 3 2023, 11:09 AM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2304

cusparse spGEMM APIs are very wired. i am merely restating what I learnt from its documentation but not sure if spgemm_compute only does the computation

The functions cusparseSpGEMM_workEstimation(), cusparseSpGEMM_estimateMemory(), and cusparseSpGEMM_compute() are used for both determining the buffer size and performing the actual computation.

wrengr added inline comments.Aug 3 2023, 12:37 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1814

Nit: should have a space between the "> {"

2137

Should wrap this in backticks so that it's formatted/rendered properly. Ditto for analogous documentation on other ops

2143–2144

Shouldn't there only be one of these triple-backtick lines? Ditto for the other ops

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
343–347

Should drop the void* part of the comments here, for consistency

K-Wu updated this revision to Diff 547360.Aug 4 2023, 2:48 PM
K-Wu marked an inline comment as done.

introducing macros

K-Wu updated this revision to Diff 547363.Aug 4 2023, 2:50 PM

addressing comments

K-Wu updated this revision to Diff 547366.Aug 4 2023, 2:55 PM
K-Wu marked 4 inline comments as done.

addressing comments

K-Wu added a comment.Aug 4 2023, 2:55 PM

addressing some comments

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2137

Got it. I will apply backticks in all the new operators in this revision, and create a dedicated revision to applying backtick to other tablegen entries and applying the declaration macro to other ops.

K-Wu updated this revision to Diff 547381.Aug 4 2023, 3:47 PM
K-Wu marked 3 inline comments as done.

addressing some comments

K-Wu updated this revision to Diff 547906.Aug 7 2023, 12:18 PM
K-Wu marked an inline comment as done.

reducing op num

K-Wu marked 2 inline comments as done.Aug 7 2023, 12:18 PM
K-Wu updated this revision to Diff 547915.Aug 7 2023, 12:35 PM

addressing formatting

K-Wu marked an inline comment as done.Aug 7 2023, 12:36 PM
aartbik added inline comments.Aug 7 2023, 12:48 PM
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
129

sparse matrix ->dense matrix

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2132

I would drop the "supported by sparse tensor", just

"selected algorithm for SpGEMM",

K-Wu updated this revision to Diff 547942.Aug 7 2023, 2:11 PM
K-Wu marked 2 inline comments as done.

addressing comments

aartbik accepted this revision.Aug 7 2023, 4:16 PM
aartbik added inline comments.
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
311–317

I agree we should find an automated way for this, both the types and names.

As for the names, now, it is inconsistent with the other parts, since we only documented stream for some reason ;-)
But for the long lists, it actually does help to find the right ones. We still have the problem that we document the
type for stream (as in void*stream), but not the others, but dropping the stream type would be inconsistent with the rest.

So, okay to leave what you have now, but we probably need to revisit this one day.

716

we can use this for ALL rules now, right? I can see you don't want to do that in this CL,
but add a TODO for follow up

This revision is now accepted and ready to land.Aug 7 2023, 4:16 PM
wrengr added inline comments.Aug 7 2023, 4:22 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2157–2158

nit: should reformat this to avoid overly long lines

2339

nit: reformat to avoid overly long lines

K-Wu updated this revision to Diff 548005.Aug 7 2023, 5:28 PM
K-Wu marked 3 inline comments as done.

addressing comments

K-Wu added a comment.Aug 7 2023, 5:31 PM

addressing comments

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
311–317

noted :D

This revision was landed with ongoing or failed builds.Aug 7 2023, 5:36 PM
This revision was automatically updated to reflect the committed changes.