This is an archive of the discontinued LLVM Phabricator instance.

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
125–129 ↗	(On Diff #542744)	This code has been repeated a bunch of times now. You should define a new tblgen-class so that you can then simply say `def GPU_SparseSpGEMMOpHandle : GPU_SparseHandle<SparseSpGEMMOpHandleType, "SpGEMM operation">;` and analogously for `GPU_SparseSpMatHandle` and `GPU_SparseDnTensorHandle` Even better, it'd be nice to hook that `GPU_SparseHandle` tblgen-class in with the `SparseHandleKind` enum (also moving that enum from C++ to tblgen), so that you can further simplify that to `def GPU_SparseSpGEMMOpHandle : GPU_SparseHandle<SpGEMMOp>;` where the `"SpGEMM operation"` string is given as the description/summary of the `SpGEMMOp` member of the tblgen-enum.
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1864–1866	does CUDA specify this discrepancy between the numeric value and the algorithm name? If it doesn't, then you may want to make the numeric values match the algorithm name for better clarity

wrengr added inline comments.Jul 21 2023, 3:20 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2189	This description needs to be updated to actually say something about what the op is/does. Ditto for all the other new ops

I feels like some parts are still missing here. Also, do we need the full fine-grained GPU dialect op <-> cusparse lib call correspondence
can we start by merging a few of the set-up lib calls into one gpu dialect op for now (similar to what we did for cusparselt 2:4)

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
125–129 ↗	(On Diff #542744)	+1 on at least the first improvement (at first glance I was a bit surprised to see a handle for an operation and not the data, but after reading GEMM in detail, I get it ;-)
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1864–1866	Yeah, it even seems stranger. But perhaps we can keep the 3,4,5 directly here? typedef enum { CUSPARSE_SPGEMM_DEFAULT = 0, CUSPARSE_SPGEMM_CSR_ALG_DETERMINITIC = 1, CUSPARSE_SPGEMM_CSR_ALG_NONDETERMINITIC = 2, CUSPARSE_SPGEMM_ALG1 = 3, CUSPARSE_SPGEMM_ALG2 = 4, CUSPARSE_SPGEMM_ALG3 = 5 } cusparseSpGEMMAlg_t;
1874	why? doc says Default algorithm. Currently, it is CUSPARSE_SPGEMM_ALG1.
2189	+1
2210	seems like you did not finish the description?
2215	missing argument, no return? gpu.spgemm_destroy_descr %descriptor
2285	missing?
2290	seems not right?
2403	this is the copy op?
2417	not complete?
2460	not copy?
2462	get size?
2496	empty line for space

addressing comments

addressing some comments

Harbormaster completed remote builds in B249584: Diff 546193.Aug 1 2023, 2:27 PM

addressing comments

better document

Harbormaster completed remote builds in B249620: Diff 546254.Aug 1 2023, 5:13 PM

aartbik added inline comments.Aug 1 2023, 5:19 PM

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
120 ↗	(On Diff #546254)	"SpGEMM operation" -> "SpGEMM operation handle type"
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1874	doesn't this fit on one line now? or is this forced by lint?
2189	can we say a bit more? the doc now pretty much says what the name already implies. perhaps say how such a descriptor is used subsequently
2240	could we combine this with spgemm_estimate_memory to reduce the number of ops (i.e. if they always appear in sequence, we could keep the operation footprint a bit less compared to actual lib calls)?
2253	can we break this over multiple lines to stay in 80-col
2301	how does this perform the actual computation?
2360	wait, this is no longer used to determine buffer size, right? you used use the buffer that was allocated
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
798–808 ↗	(On Diff #542604)	yeah, I am also thinking that keeping the definition and declaration together would already go a long way reducing boilerplate quite a bit but, if we use a macro of the code here, then that would also be more readable and we can keep the code below
mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
481 ↗	(On Diff #546254)	we repeat this TODO everywhere, shall we just mention it once in the alpha/beta macro
mlir/test/Conversion/GPUCommon/lower-sparse-to-gpu-runtime-calls.mlir
123 ↗	(On Diff #546254)	break lone line
128 ↗	(On Diff #546254)	and here
mlir/test/Dialect/GPU/sparse-roundtrip.mlir
95 ↗	(On Diff #546254)	break
100 ↗	(On Diff #546254)	break

wrengr added inline comments.Aug 2 2023, 12:42 PM

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
116 ↗	(On Diff #546254)	Fwiw, The thing I had in mind for this class would be to use `description#" handle type"` here, since that string suffix is repeated by all the instances of this class.
120 ↗	(On Diff #546254)	cf my comment on line 116

wrengr added inline comments.Aug 2 2023, 1:10 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1864–1866	Is there any reason not to also include `DEFAULT`? Especially if we do indeed make it the `GPU_SpGEMMAlgAttr.defaultValue`... (I'd suggest also adding `CSR_ALG_{,NON}DETERMINITIC`, though that'd require some extra validation to make sure that the arguments are CSR whenever those algos are requested. Though I'd imagine that we already have similar requirements for `Alg{1,2,3}`...)
2240	And if you're worried that "even though they always go together now, perhaps we'll need to split them up later on", then that could always be done via giving the op a new argument `enum Estimate { Work, Memory, Both }`. ...That is, assuming the extra buffer arguments to the function calls can be easily hidden away as part of the op lowering. Since our goal is to build a compiler rather than to provide MLIR bindings to the library per se, I think it's better to try to keep the ops as high-level as we actually want/need for the compiler, rather than going for 1-to-1 correspondence with the library function calls. Keeping things at the granularity of the compiler often helps both to improve codegen and to avoid compatibility issues with different library versions. Granted there's always a balancing act for this sort of thing, but nevertheless
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
311–317 ↗	(On Diff #546254)	I'm not sure what style is used in the rest of the file/dialect, but is it really necessary/helpful to provide the C/C++ types for each of the arguments, given as they have an easy 1-to-1 mapping to the LLVM types? Giving the names of the arguments makes sense, but the types less so imo

aartbik added inline comments.Aug 3 2023, 9:25 AM

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
120 ↗	(On Diff #546254)	yeah, that would work (the full description would then only be visible in doc page, and not inline in code, but I like that this automated way forces consistently)
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2240	+1 on the compiler vs 1:1 lib argument, and the proposal for the future

addressing comments

K-Wu added inline comments.Aug 3 2023, 11:09 AM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2360	cusparse spGEMM APIs are very wired. i am merely restating what I learnt from its documentation but not sure if spgemm_compute only does the computation The functions cusparseSpGEMM_workEstimation(), cusparseSpGEMM_estimateMemory(), and cusparseSpGEMM_compute() are used for both determining the buffer size and performing the actual computation.

wrengr added inline comments.Aug 3 2023, 12:37 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1873	Nit: should have a space between the "`> {`"
2193	Should wrap this in backticks so that it's formatted/rendered properly. Ditto for analogous documentation on other ops
2199–2200	Shouldn't there only be one of these triple-backtick lines? Ditto for the other ops
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
343–347 ↗	(On Diff #546943)	Should drop the `void*` part of the comments here, for consistency

Harbormaster completed remote builds in B250127: Diff 546943.Aug 3 2023, 1:10 PM

introducing macros

addressing comments

addressing some comments

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2193	Got it. I will apply backticks in all the new operators in this revision, and create a dedicated revision to applying backtick to other tablegen entries and applying the declaration macro to other ops.

addressing some comments

Harbormaster completed remote builds in B250452: Diff 547381.Aug 4 2023, 5:17 PM

reducing op num

K-Wu marked 2 inline comments as done.Aug 7 2023, 12:18 PM

addressing formatting

K-Wu marked an inline comment as done.Aug 7 2023, 12:36 PM

aartbik added inline comments.Aug 7 2023, 12:48 PM

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
119 ↗	(On Diff #547915)	sparse matrix ->dense matrix
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2188	I would drop the "supported by sparse tensor", just "selected algorithm for SpGEMM",

addressing comments

aartbik accepted this revision.Aug 7 2023, 4:16 PM

aartbik added inline comments.

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
711 ↗	(On Diff #547942)	we can use this for ALL rules now, right? I can see you don't want to do that in this CL, but add a TODO for follow up
311–317 ↗	(On Diff #546254)	I agree we should find an automated way for this, both the types and names. As for the names, now, it is inconsistent with the other parts, since we only documented stream for some reason ;-) But for the long lists, it actually does help to find the right ones. We still have the problem that we document the type for stream (as in void*stream), but not the others, but dropping the stream type would be inconsistent with the rest. So, okay to leave what you have now, but we probably need to revisit this one day.

This revision is now accepted and ready to land.Aug 7 2023, 4:16 PM

wrengr added inline comments.Aug 7 2023, 4:22 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2213–2214	nit: should reformat this to avoid overly long lines
2395	nit: reformat to avoid overly long lines

addressing comments

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
311–317 ↗	(On Diff #546254)	noted :D

This revision was landed with ongoing or failed builds.Aug 7 2023, 5:36 PM

Closed by commit rGdfe294290948: [mlir][sparse][gpu] add spgemm operator (authored by K-Wu). · Explain Why

This revision was automatically updated to reflect the committed changes.

K-Wu added a commit: rGdfe294290948: [mlir][sparse][gpu] add spgemm operator.

Harbormaster completed remote builds in B250954: Diff 548005.Aug 7 2023, 8:28 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

IR/

GPUOps.td

2 lines

Diff 532298

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

Show First 20 Lines • Show All 1,855 Lines • ▼ Show 20 Lines
def GPU_TransposeModeAttr : EnumAttr<GPU_Dialect, GPU_TransposeMode,		def GPU_TransposeModeAttr : EnumAttr<GPU_Dialect, GPU_TransposeMode,
"mat_transpose_mode">{		"mat_transpose_mode">{
let defaultValue = "TransposeMode::NON_TRANSPOSE";		let defaultValue = "TransposeMode::NON_TRANSPOSE";
}		}

def GPU_SpMVBufferSizeOp : GPU_Op<"spmv_buffer_size", [GPU_AsyncOpInterface]> {		def GPU_SpMVBufferSizeOp : GPU_Op<"spmv_buffer_size", [GPU_AsyncOpInterface]> {
let summary = "Precompute buffersize for SpMV operation";		let summary = "Precompute buffersize for SpMV operation";
let description = [{		let description = [{
The `gpu.spmv_buffer_size` operation returns the buffer size required		The `gpu.spmv_buffer_size` operation returns the buffer size required
to perform the SpMV operation on the given sparse matrix and dense vectors.		to perform the SpMV operation on the given sparse matrix and dense vectors.
The operation expects handles returned by previous sparse operations		The operation expects handles returned by previous sparse operations
		wrengrUnsubmitted Done Reply Inline Actions does CUDA specify this discrepancy between the numeric value and the algorithm name? If it doesn't, then you may want to make the numeric values match the algorithm name for better clarity wrengr: does CUDA specify this discrepancy between the numeric value and the algorithm name? If it…
		aartbikUnsubmitted Done Reply Inline Actions Yeah, it even seems stranger. But perhaps we can keep the 3,4,5 directly here? typedef enum { CUSPARSE_SPGEMM_DEFAULT = 0, CUSPARSE_SPGEMM_CSR_ALG_DETERMINITIC = 1, CUSPARSE_SPGEMM_CSR_ALG_NONDETERMINITIC = 2, CUSPARSE_SPGEMM_ALG1 = 3, CUSPARSE_SPGEMM_ALG2 = 4, CUSPARSE_SPGEMM_ALG3 = 5 } cusparseSpGEMMAlg_t; aartbik: Yeah, it even seems stranger. But perhaps we can keep the 3,4,5 directly here? typedef enum…
		wrengrUnsubmitted Done Reply Inline Actions Is there any reason not to also include `DEFAULT`? Especially if we do indeed make it the `GPU_SpGEMMAlgAttr.defaultValue`... (I'd suggest also adding `CSR_ALG_{,NON}DETERMINITIC`, though that'd require some extra validation to make sure that the arguments are CSR whenever those algos are requested. Though I'd imagine that we already have similar requirements for `Alg{1,2,3}`...) wrengr: Is there any reason not to also include `DEFAULT`? Especially if we do indeed make it the…
to construct an environment and the operands for SpMV.		to construct an environment and the operands for SpMV.

If the `async` keyword is present, the op is executed asynchronously (i.e.		If the `async` keyword is present, the op is executed asynchronously (i.e.
it does not block until the execution has finished on the device). In		it does not block until the execution has finished on the device). In
that case, it returns a !gpu.async.token in addition to the environment.		that case, it returns a !gpu.async.token in addition to the environment.

The matrix arguments can also be associated with one of the following		The matrix arguments can also be associated with one of the following
		wrengrUnsubmitted Done Reply Inline Actions Nit: should have a space between the "`> {`" wrengr: Nit: should have a space between the "`> {`"
operators: NON_TRANSPOSE, TRANSPOSE, CONJUGATE_TRANSPOSE. The default value		operators: NON_TRANSPOSE, TRANSPOSE, CONJUGATE_TRANSPOSE. The default value
		aartbikUnsubmitted Done Reply Inline Actions why? doc says Default algorithm. Currently, it is CUSPARSE_SPGEMM_ALG1. aartbik: why? doc says Default algorithm. Currently, it is CUSPARSE_SPGEMM_ALG1.
		aartbikUnsubmitted Done Reply Inline Actions doesn't this fit on one line now? or is this forced by lint? aartbik: doesn't this fit on one line now? or is this forced by lint?
is NON_TRANSPOSE.		is NON_TRANSPOSE.

Example:		Example:

```mlir		```mlir
%buffersz, %token = gpu.spmv_buffer_size async [%dep] %env, %spmatA{TRANSPOSE}, %dnX, %dnY into f32		%buffersz, %token = gpu.spmv_buffer_size async [%dep] %env, %spmatA{TRANSPOSE}, %dnX, %dnY into f32
```		```
}];		}];
▲ Show 20 Lines • Show All 295 Lines • ▼ Show 20 Lines	def GPU_SDDMMOp : GPU_Op<"sddmm", [GPU_AsyncOpInterface]> {
];		];

let assemblyFormat = [{		let assemblyFormat = [{
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$env `,` $dnmatA (`{` $modeA^ `}`)? `,` $dnmatB (`{` $modeB^ `}`)? `,` $spmatC `,` $buffer attr-dict `:` type($buffer) `into` $computeType		$env `,` $dnmatA (`{` $modeA^ `}`)? `,` $dnmatB (`{` $modeB^ `}`)? `,` $spmatC `,` $buffer attr-dict `:` type($buffer) `into` $computeType
}];		}];
}		}


		K-WuAuthorUnsubmitted Done Reply Inline Actions TODO: remove K-Wu: TODO: remove

#endif // GPU_OPS		#endif // GPU_OPS
		wrengrUnsubmitted Done Reply Inline Actions This description needs to be updated to actually say something about what the op is/does. Ditto for all the other new ops wrengr: This description needs to be updated to actually say something about what the op is/does. Ditto…
		aartbikUnsubmitted Done Reply Inline Actions +1 aartbik: +1
		aartbikUnsubmitted Done Reply Inline Actions empty line for space aartbik: empty line for space
		aartbikUnsubmitted Done Reply Inline Actions missing argument, no return? gpu.spgemm_destroy_descr %descriptor aartbik: missing argument, no return? gpu.spgemm_destroy_descr %descriptor
		aartbikUnsubmitted Done Reply Inline Actions seems like you did not finish the description? aartbik: seems like you did not finish the description?
		aartbikUnsubmitted Done Reply Inline Actions missing? aartbik: missing?
		aartbikUnsubmitted Done Reply Inline Actions seems not right? aartbik: seems not right?
		aartbikUnsubmitted Done Reply Inline Actions this is the copy op? aartbik: this is the copy op?
		aartbikUnsubmitted Done Reply Inline Actions not complete? aartbik: not complete?
		aartbikUnsubmitted Done Reply Inline Actions get size? aartbik: get size?
		aartbikUnsubmitted Done Reply Inline Actions not copy? aartbik: not copy?
		aartbikUnsubmitted Done Reply Inline Actions can we say a bit more? the doc now pretty much says what the name already implies. perhaps say how such a descriptor is used subsequently aartbik: can we say a bit more? the doc now pretty much says what the name already implies. perhaps say…
		aartbikUnsubmitted Done Reply Inline Actions can we break this over multiple lines to stay in 80-col aartbik: can we break this over multiple lines to stay in 80-col
		aartbikUnsubmitted Not Done Reply Inline Actions wait, this is no longer used to determine buffer size, right? you used use the buffer that was allocated aartbik: wait, this is no longer used to determine buffer size, right? you used use the buffer that…
		K-WuAuthorUnsubmitted Done Reply Inline Actions cusparse spGEMM APIs are very wired. i am merely restating what I learnt from its documentation but not sure if spgemm_compute only does the computation The functions cusparseSpGEMM_workEstimation(), cusparseSpGEMM_estimateMemory(), and cusparseSpGEMM_compute() are used for both determining the buffer size and performing the actual computation. K-Wu: cusparse spGEMM APIs are very wired. i am merely restating what I learnt from its documentation…
		aartbikUnsubmitted Not Done Reply Inline Actions how does this perform the actual computation? aartbik: how does this perform the actual computation?
		aartbikUnsubmitted Done Reply Inline Actions could we combine this with spgemm_estimate_memory to reduce the number of ops (i.e. if they always appear in sequence, we could keep the operation footprint a bit less compared to actual lib calls)? aartbik: could we combine this with spgemm_estimate_memory to reduce the number of ops (i.e. if they…
		wrengrUnsubmitted Done Reply Inline Actions And if you're worried that "even though they always go together now, perhaps we'll need to split them up later on", then that could always be done via giving the op a new argument `enum Estimate { Work, Memory, Both }`. ...That is, assuming the extra buffer arguments to the function calls can be easily hidden away as part of the op lowering. Since our goal is to build a compiler rather than to provide MLIR bindings to the library per se, I think it's better to try to keep the ops as high-level as we actually want/need for the compiler, rather than going for 1-to-1 correspondence with the library function calls. Keeping things at the granularity of the compiler often helps both to improve codegen and to avoid compatibility issues with different library versions. Granted there's always a balancing act for this sort of thing, but nevertheless wrengr: And if you're worried that "even though they always go together now, perhaps we'll need to…
		aartbikUnsubmitted Done Reply Inline Actions +1 on the compiler vs 1:1 lib argument, and the proposal for the future aartbik: +1 on the compiler vs 1:1 lib argument, and the proposal for the future
		wrengrUnsubmitted Done Reply Inline Actions Should wrap this in backticks so that it's formatted/rendered properly. Ditto for analogous documentation on other ops wrengr: Should wrap this in backticks so that it's formatted/rendered properly. Ditto for analogous…
		K-WuAuthorUnsubmitted Done Reply Inline Actions Got it. I will apply backticks in all the new operators in this revision, and create a dedicated revision to applying backtick to other tablegen entries and applying the declaration macro to other ops. K-Wu: Got it. I will apply backticks in all the new operators in this revision, and create a…
		wrengrUnsubmitted Done Reply Inline Actions Shouldn't there only be one of these triple-backtick lines? Ditto for the other ops wrengr: Shouldn't there only be one of these triple-backtick lines? Ditto for the other ops
		aartbikUnsubmitted Done Reply Inline Actions I would drop the "supported by sparse tensor", just "selected algorithm for SpGEMM", aartbik: I would drop the "supported by sparse tensor", just "selected algorithm for SpGEMM",
		wrengrUnsubmitted Done Reply Inline Actions nit: should reformat this to avoid overly long lines wrengr: nit: should reformat this to avoid overly long lines
		wrengrUnsubmitted Done Reply Inline Actions nit: reformat to avoid overly long lines wrengr: nit: reformat to avoid overly long lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse][gpu] add spgemm operatorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 532298

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

[mlir][sparse][gpu] add spgemm operator
ClosedPublic