This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse][gpu] add spgemm operator
ClosedPublic

Authored by K-Wu on Jun 14 2023, 4:54 PM.

Diff Detail

Event Timeline

K-Wu created this revision.Jun 14 2023, 4:54 PM
K-Wu updated this revision to Diff 531847.Jun 15 2023, 11:24 AM

rebase origin/main

K-Wu updated this revision to Diff 532682.Jun 19 2023, 8:57 AM

stashing changes

K-Wu updated this revision to Diff 536358.Jun 30 2023, 12:11 PM

rebase origin/main

K-Wu updated this revision to Diff 536393.Jun 30 2023, 1:22 PM

rebase origin/main

K-Wu updated this revision to Diff 537500.Jul 5 2023, 2:25 PM

nothing should be changed

K-Wu updated this revision to Diff 541779.Jul 18 2023, 4:35 PM

uploading changes

K-Wu updated this revision to Diff 541804.Jul 18 2023, 6:28 PM

adding ops

K-Wu updated this revision to Diff 542181.Jul 19 2023, 1:57 PM

added GPU rewriter pass

K-Wu updated this revision to Diff 542266.Jul 19 2023, 5:56 PM

finish all gpu rewriters

K-Wu updated this revision to Diff 542311.Jul 19 2023, 10:53 PM

add test

K-Wu updated this revision to Diff 542602.Jul 20 2023, 11:10 AM

upd tests

K-Wu published this revision for review.Jul 20 2023, 11:16 AM
K-Wu added reviewers: Peiming, bixia, wrengr, yinying-lisa-li.
Peiming added inline comments.Jul 20 2023, 1:05 PM
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
798–808 ↗(On Diff #542604)

I somehow feel that we have repeated this code so many times, do you think you can make it a macro instead?

K-Wu added inline comments.Jul 20 2023, 1:37 PM
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
798–808 ↗(On Diff #542604)

Good point. I will think about it.

K-Wu added inline comments.Jul 20 2023, 5:39 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2186

TODO: remove

K-Wu added a comment.Jul 20 2023, 7:23 PM

addressed self-comment

wrengr added inline comments.Jul 21 2023, 3:20 PM
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
125–129 ↗(On Diff #542744)

This code has been repeated a bunch of times now. You should define a new tblgen-class so that you can then simply say def GPU_SparseSpGEMMOpHandle : GPU_SparseHandle<SparseSpGEMMOpHandleType, "SpGEMM operation">; and analogously for GPU_SparseSpMatHandle and GPU_SparseDnTensorHandle

Even better, it'd be nice to hook that GPU_SparseHandle tblgen-class in with the SparseHandleKind enum (also moving that enum from C++ to tblgen), so that you can further simplify that to def GPU_SparseSpGEMMOpHandle : GPU_SparseHandle<SpGEMMOp>; where the "SpGEMM operation" string is given as the description/summary of the SpGEMMOp member of the tblgen-enum.

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1864–1866

does CUDA specify this discrepancy between the numeric value and the algorithm name? If it doesn't, then you may want to make the numeric values match the algorithm name for better clarity

wrengr added inline comments.Jul 21 2023, 3:20 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2189

This description needs to be updated to actually say something about what the op is/does. Ditto for all the other new ops

I feels like some parts are still missing here. Also, do we need the full fine-grained GPU dialect op <-> cusparse lib call correspondence
can we start by merging a few of the set-up lib calls into one gpu dialect op for now (similar to what we did for cusparselt 2:4)

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
125–129 ↗(On Diff #542744)

+1 on at least the first improvement

(at first glance I was a bit surprised to see a handle for an operation and not the data, but after reading GEMM in detail, I get it ;-)

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1864–1866

Yeah, it even seems stranger. But perhaps we can keep the 3,4,5 directly here?

typedef enum {

CUSPARSE_SPGEMM_DEFAULT                 = 0,
CUSPARSE_SPGEMM_CSR_ALG_DETERMINITIC    = 1,
CUSPARSE_SPGEMM_CSR_ALG_NONDETERMINITIC = 2,
CUSPARSE_SPGEMM_ALG1                    = 3,
CUSPARSE_SPGEMM_ALG2                    = 4,
CUSPARSE_SPGEMM_ALG3                    = 5

} cusparseSpGEMMAlg_t;

1874

why?

doc says

Default algorithm. Currently, it is CUSPARSE_SPGEMM_ALG1.

2189

+1

2210

seems like you did not finish the description?

2215

missing argument, no return?

gpu.spgemm_destroy_descr %descriptor

2285

missing?

2290

seems not right?

2403

this is the copy op?

2417

not complete?

2460

not copy?

2462

get size?

2496

empty line for space

K-Wu updated this revision to Diff 546193.Aug 1 2023, 1:06 PM
K-Wu marked 5 inline comments as done.

addressing comments

K-Wu added a comment.Aug 1 2023, 1:06 PM

addressing some comments

K-Wu updated this revision to Diff 546253.Aug 1 2023, 3:47 PM
K-Wu marked 11 inline comments as done.

addressing comments

K-Wu added a comment.Aug 1 2023, 3:52 PM

addressing comments

K-Wu updated this revision to Diff 546254.Aug 1 2023, 3:54 PM

better document

aartbik added inline comments.Aug 1 2023, 5:19 PM
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
120 ↗(On Diff #546254)

"SpGEMM operation" -> "SpGEMM operation handle type"

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1874

doesn't this fit on one line now? or is this forced by lint?

2189

can we say a bit more? the doc now pretty much says what the name already implies.
perhaps say how such a descriptor is used subsequently

2240

could we combine this with spgemm_estimate_memory to reduce the number of ops
(i.e. if they always appear in sequence, we could keep the operation footprint a bit less compared to actual lib calls)?

2253

can we break this over multiple lines to stay in 80-col

2301

how does this perform the actual computation?

2360

wait, this is no longer used to *determine* buffer size, right?
you used use the buffer that was allocated

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
798–808 ↗(On Diff #542604)

yeah, I am also thinking that keeping the definition and declaration together would already go a long way reducing boilerplate quite a bit
but, if we use a macro of the code here, then that would also be more readable and we can keep the code below

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
481 ↗(On Diff #546254)

we repeat this TODO everywhere, shall we just mention it once in the alpha/beta macro

mlir/test/Conversion/GPUCommon/lower-sparse-to-gpu-runtime-calls.mlir
123 ↗(On Diff #546254)

break lone line

128 ↗(On Diff #546254)

and here

mlir/test/Dialect/GPU/sparse-roundtrip.mlir
95 ↗(On Diff #546254)

break

100 ↗(On Diff #546254)

break

wrengr added inline comments.Aug 2 2023, 12:42 PM
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
116 ↗(On Diff #546254)

Fwiw, The thing I had in mind for this class would be to use description#" handle type" here, since that string suffix is repeated by all the instances of this class.

120 ↗(On Diff #546254)

cf my comment on line 116

wrengr added inline comments.Aug 2 2023, 1:10 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1864–1866

Is there any reason not to also include DEFAULT? Especially if we do indeed make it the GPU_SpGEMMAlgAttr.defaultValue...

(I'd suggest also adding CSR_ALG_{,NON}DETERMINITIC, though that'd require some extra validation to make sure that the arguments are CSR whenever those algos are requested. Though I'd imagine that we already have similar requirements for Alg{1,2,3}...)

2240

And if you're worried that "even though they always go together now, perhaps we'll need to split them up later on", then that could always be done via giving the op a new argument enum Estimate { Work, Memory, Both }. ...That is, assuming the extra buffer arguments to the function calls can be easily hidden away as part of the op lowering.

Since our goal is to build a compiler rather than to provide MLIR bindings to the library per se, I think it's better to try to keep the ops as high-level as we actually want/need for the compiler, rather than going for 1-to-1 correspondence with the library function calls. Keeping things at the granularity of the compiler often helps both to improve codegen and to avoid compatibility issues with different library versions. Granted there's always a balancing act for this sort of thing, but nevertheless

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
311–317 ↗(On Diff #546254)

I'm not sure what style is used in the rest of the file/dialect, but is it really necessary/helpful to provide the C/C++ types for each of the arguments, given as they have an easy 1-to-1 mapping to the LLVM types? Giving the names of the arguments makes sense, but the types less so imo

aartbik added inline comments.Aug 3 2023, 9:25 AM
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
120 ↗(On Diff #546254)

yeah, that would work (the full description would then only be visible in doc page, and not inline in code, but I like that this automated way forces consistently)

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2240

+1 on the compiler vs 1:1 lib argument, and the proposal for the future

K-Wu updated this revision to Diff 546943.Aug 3 2023, 10:58 AM
K-Wu marked 11 inline comments as done.

addressing comments

K-Wu added inline comments.Aug 3 2023, 11:09 AM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2360

cusparse spGEMM APIs are very wired. i am merely restating what I learnt from its documentation but not sure if spgemm_compute only does the computation

The functions cusparseSpGEMM_workEstimation(), cusparseSpGEMM_estimateMemory(), and cusparseSpGEMM_compute() are used for both determining the buffer size and performing the actual computation.

wrengr added inline comments.Aug 3 2023, 12:37 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1873

Nit: should have a space between the "> {"

2193

Should wrap this in backticks so that it's formatted/rendered properly. Ditto for analogous documentation on other ops

2199–2200

Shouldn't there only be one of these triple-backtick lines? Ditto for the other ops

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
343–347 ↗(On Diff #546943)

Should drop the void* part of the comments here, for consistency

K-Wu updated this revision to Diff 547360.Aug 4 2023, 2:48 PM
K-Wu marked an inline comment as done.

introducing macros

K-Wu updated this revision to Diff 547363.Aug 4 2023, 2:50 PM

addressing comments

K-Wu updated this revision to Diff 547366.Aug 4 2023, 2:55 PM
K-Wu marked 4 inline comments as done.

addressing comments

K-Wu added a comment.Aug 4 2023, 2:55 PM

addressing some comments

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2193

Got it. I will apply backticks in all the new operators in this revision, and create a dedicated revision to applying backtick to other tablegen entries and applying the declaration macro to other ops.

K-Wu updated this revision to Diff 547381.Aug 4 2023, 3:47 PM
K-Wu marked 3 inline comments as done.

addressing some comments

K-Wu updated this revision to Diff 547906.Aug 7 2023, 12:18 PM
K-Wu marked an inline comment as done.

reducing op num

K-Wu marked 2 inline comments as done.Aug 7 2023, 12:18 PM
K-Wu updated this revision to Diff 547915.Aug 7 2023, 12:35 PM

addressing formatting

K-Wu marked an inline comment as done.Aug 7 2023, 12:36 PM
aartbik added inline comments.Aug 7 2023, 12:48 PM
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
119 ↗(On Diff #547915)

sparse matrix ->dense matrix

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2188

I would drop the "supported by sparse tensor", just

"selected algorithm for SpGEMM",

K-Wu updated this revision to Diff 547942.Aug 7 2023, 2:11 PM
K-Wu marked 2 inline comments as done.

addressing comments

aartbik accepted this revision.Aug 7 2023, 4:16 PM
aartbik added inline comments.
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
711 ↗(On Diff #547942)

we can use this for ALL rules now, right? I can see you don't want to do that in this CL,
but add a TODO for follow up

311–317 ↗(On Diff #546254)

I agree we should find an automated way for this, both the types and names.

As for the names, now, it is inconsistent with the other parts, since we only documented stream for some reason ;-)
But for the long lists, it actually does help to find the right ones. We still have the problem that we document the
type for stream (as in void*stream), but not the others, but dropping the stream type would be inconsistent with the rest.

So, okay to leave what you have now, but we probably need to revisit this one day.

This revision is now accepted and ready to land.Aug 7 2023, 4:16 PM
wrengr added inline comments.Aug 7 2023, 4:22 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2213–2214

nit: should reformat this to avoid overly long lines

2395

nit: reformat to avoid overly long lines

K-Wu updated this revision to Diff 548005.Aug 7 2023, 5:28 PM
K-Wu marked 3 inline comments as done.

addressing comments

K-Wu added a comment.Aug 7 2023, 5:31 PM

addressing comments

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
311–317 ↗(On Diff #546254)

noted :D

This revision was landed with ongoing or failed builds.Aug 7 2023, 5:36 PM
This revision was automatically updated to reflect the committed changes.