Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp | ||
---|---|---|
798–808 ↗ | (On Diff #542604) | I somehow feel that we have repeated this code so many times, do you think you can make it a macro instead? |
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp | ||
---|---|---|
798–808 ↗ | (On Diff #542604) | Good point. I will think about it. |
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td | ||
---|---|---|
2186 | TODO: remove |
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td | ||
---|---|---|
125–129 ↗ | (On Diff #542744) | This code has been repeated a bunch of times now. You should define a new tblgen-class so that you can then simply say def GPU_SparseSpGEMMOpHandle : GPU_SparseHandle<SparseSpGEMMOpHandleType, "SpGEMM operation">; and analogously for GPU_SparseSpMatHandle and GPU_SparseDnTensorHandle Even better, it'd be nice to hook that GPU_SparseHandle tblgen-class in with the SparseHandleKind enum (also moving that enum from C++ to tblgen), so that you can further simplify that to def GPU_SparseSpGEMMOpHandle : GPU_SparseHandle<SpGEMMOp>; where the "SpGEMM operation" string is given as the description/summary of the SpGEMMOp member of the tblgen-enum. |
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td | ||
1864–1866 | does CUDA specify this discrepancy between the numeric value and the algorithm name? If it doesn't, then you may want to make the numeric values match the algorithm name for better clarity |
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td | ||
---|---|---|
2189 | This description needs to be updated to actually say something about what the op is/does. Ditto for all the other new ops |
I feels like some parts are still missing here. Also, do we need the full fine-grained GPU dialect op <-> cusparse lib call correspondence
can we start by merging a few of the set-up lib calls into one gpu dialect op for now (similar to what we did for cusparselt 2:4)
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td | ||
---|---|---|
125–129 ↗ | (On Diff #542744) | +1 on at least the first improvement (at first glance I was a bit surprised to see a handle for an operation and not the data, but after reading GEMM in detail, I get it ;-) |
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td | ||
1864–1866 | Yeah, it even seems stranger. But perhaps we can keep the 3,4,5 directly here? typedef enum { CUSPARSE_SPGEMM_DEFAULT = 0, CUSPARSE_SPGEMM_CSR_ALG_DETERMINITIC = 1, CUSPARSE_SPGEMM_CSR_ALG_NONDETERMINITIC = 2, CUSPARSE_SPGEMM_ALG1 = 3, CUSPARSE_SPGEMM_ALG2 = 4, CUSPARSE_SPGEMM_ALG3 = 5 } cusparseSpGEMMAlg_t; | |
1874 | why? doc says Default algorithm. Currently, it is CUSPARSE_SPGEMM_ALG1. | |
2189 | +1 | |
2210 | seems like you did not finish the description? | |
2215 | missing argument, no return? gpu.spgemm_destroy_descr %descriptor | |
2285 | missing? | |
2290 | seems not right? | |
2403 | this is the copy op? | |
2417 | not complete? | |
2460 | not copy? | |
2462 | get size? | |
2496 | empty line for space |
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td | ||
---|---|---|
120 ↗ | (On Diff #546254) | "SpGEMM operation" -> "SpGEMM operation handle type" |
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td | ||
1874 | doesn't this fit on one line now? or is this forced by lint? | |
2189 | can we say a bit more? the doc now pretty much says what the name already implies. | |
2240 | could we combine this with spgemm_estimate_memory to reduce the number of ops | |
2253 | can we break this over multiple lines to stay in 80-col | |
2301 | how does this perform the actual computation? | |
2360 | wait, this is no longer used to *determine* buffer size, right? | |
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp | ||
798–808 ↗ | (On Diff #542604) | yeah, I am also thinking that keeping the definition and declaration together would already go a long way reducing boilerplate quite a bit |
mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp | ||
481 ↗ | (On Diff #546254) | we repeat this TODO everywhere, shall we just mention it once in the alpha/beta macro |
mlir/test/Conversion/GPUCommon/lower-sparse-to-gpu-runtime-calls.mlir | ||
123 ↗ | (On Diff #546254) | break lone line |
128 ↗ | (On Diff #546254) | and here |
mlir/test/Dialect/GPU/sparse-roundtrip.mlir | ||
95 ↗ | (On Diff #546254) | break |
100 ↗ | (On Diff #546254) | break |
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td | ||
---|---|---|
1864–1866 | Is there any reason not to also include DEFAULT? Especially if we do indeed make it the GPU_SpGEMMAlgAttr.defaultValue... (I'd suggest also adding CSR_ALG_{,NON}DETERMINITIC, though that'd require some extra validation to make sure that the arguments are CSR whenever those algos are requested. Though I'd imagine that we already have similar requirements for Alg{1,2,3}...) | |
2240 | And if you're worried that "even though they always go together now, perhaps we'll need to split them up later on", then that could always be done via giving the op a new argument enum Estimate { Work, Memory, Both }. ...That is, assuming the extra buffer arguments to the function calls can be easily hidden away as part of the op lowering. Since our goal is to build a compiler rather than to provide MLIR bindings to the library per se, I think it's better to try to keep the ops as high-level as we actually want/need for the compiler, rather than going for 1-to-1 correspondence with the library function calls. Keeping things at the granularity of the compiler often helps both to improve codegen and to avoid compatibility issues with different library versions. Granted there's always a balancing act for this sort of thing, but nevertheless | |
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp | ||
311–317 ↗ | (On Diff #546254) | I'm not sure what style is used in the rest of the file/dialect, but is it really necessary/helpful to provide the C/C++ types for each of the arguments, given as they have an easy 1-to-1 mapping to the LLVM types? Giving the names of the arguments makes sense, but the types less so imo |
mlir/include/mlir/Dialect/GPU/IR/GPUBase.td | ||
---|---|---|
120 ↗ | (On Diff #546254) | yeah, that would work (the full description would then only be visible in doc page, and not inline in code, but I like that this automated way forces consistently) |
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td | ||
2240 | +1 on the compiler vs 1:1 lib argument, and the proposal for the future |
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td | ||
---|---|---|
2360 | cusparse spGEMM APIs are very wired. i am merely restating what I learnt from its documentation but not sure if spgemm_compute only does the computation
|
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td | ||
---|---|---|
1873 | Nit: should have a space between the "> {" | |
2193 | Should wrap this in backticks so that it's formatted/rendered properly. Ditto for analogous documentation on other ops | |
2199–2200 | Shouldn't there only be one of these triple-backtick lines? Ditto for the other ops | |
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp | ||
343–347 ↗ | (On Diff #546943) | Should drop the void* part of the comments here, for consistency |
addressing some comments
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td | ||
---|---|---|
2193 | Got it. I will apply backticks in all the new operators in this revision, and create a dedicated revision to applying backtick to other tablegen entries and applying the declaration macro to other ops. |
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp | ||
---|---|---|
711 ↗ | (On Diff #547942) | we can use this for ALL rules now, right? I can see you don't want to do that in this CL, |
311–317 ↗ | (On Diff #546254) | I agree we should find an automated way for this, both the types and names. As for the names, now, it is inconsistent with the other parts, since we only documented stream for some reason ;-) So, okay to leave what you have now, but we probably need to revisit this one day. |
addressing comments
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp | ||
---|---|---|
311–317 ↗ | (On Diff #546254) | noted :D |
does CUDA specify this discrepancy between the numeric value and the algorithm name? If it doesn't, then you may want to make the numeric values match the algorithm name for better clarity