Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/test/Dialect/SparseTensor/GPU/gpu_sampled_matmul_lib.mlir | ||
---|---|---|
88 | can we instead output #CSR type here? |
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp | ||
---|---|---|
878 | rewrite rule here |
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir | ||
---|---|---|
43 | TODO: add checking logic |
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir | ||
---|---|---|
43 | Didn't notice there is already checking logic down below. Closing this |
mlir/test/Dialect/SparseTensor/GPU/gpu_sampled_matmul_lib.mlir | ||
---|---|---|
98 | %1 = sparse_tensor_in_place_result %args: tensor<8x8xf64, #SM> // something new, I have to think about it %2 = linalg.generic #trait_sampled_dense_dense ins(%args, %arga, %argb: tensor<8x8xf64, #SM>, tensor<8x8xf64>, tensor<8x8xf64>) outs(%1: tensor<8x8xf64, #SM>) { ^bb(%s: f64, %a: f64, %b: f64, %x: f64): %p = arith.mulf %a, %b : f64 %q = arith.mulf %s, %p : f64 %r = arith.addf %x, %q : f64 linalg.yield %r : f64 } -> tensor<8x8xf64, #SM> return %2 : tensor<8x8xf64, #SM> |
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp | ||
---|---|---|
623 | we test that C is sparse, but no test at all for D So, the code below will sometimes crash, since it uses information on C not D | |
628 | this comment says more than is done below, is this a note to self, or something else? | |
632 | you miss the d : row here | |
644 | This assumes that D has exactly the same pattern already as C (or more precisely, the right length for all the arrays into which we copy the result). Perhaps we should start with simply returning C again after this operation, and find a way to allow this at IR level. | |
723 | This is the wrong replacement, we need rewriter.replaceOpWithNewOp<sparse_tensor::LoadOp>(op, ....) and probably without inserts. | |
mlir/test/Dialect/SparseTensor/GPU/gpu_sampled_matmul_lib.mlir | ||
111 | can you remove this part with reuse completely, I think we will just need the pattern above for proper testing | |
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir | ||
47 | dense output, does not go through the GPU kernel then, right? |
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp | ||
---|---|---|
617 | please fix 80-col breaks here in comment | |
619 | I think we need a if (op.getDpsInitOperand(0) != c) here to enforce the ad-hoc pattern | |
mlir/test/Dialect/SparseTensor/GPU/gpu_sampled_matmul_lib.mlir | ||
105 | perhaps add TODO: find a cleaner way to express this | |
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir | ||
48 | Put the comment about the ad-hoc idiom here too, so we don't forget. | |
93 | Just curious if you really want to use the from-file set up of the sampling matrix? We can also just create this in memory like so many other examples, right? |
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp | ||
---|---|---|
619 | Or probably more precisely after preproc %0 = bufferization.alloc_tensor() copy(%arg0) op.getDpsInitOperand(0).getDefiningOp links to %0 = bufferization.alloc_tensor() copy(%c) | |
723 | Make sure to return sparse_tensor.load %arg0 instead of just %arg0, since the rematerialization needs to be made explicit. |
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir | ||
---|---|---|
93 | Okay I will incorporate it next time |
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp | ||
---|---|---|
334–351 | The nest is too deep here. Could you either try to break it into more functions or early returns on some simple conditions? e.g., if (!isa_and_nonnull(def)) return false; Value s_out = xxx xxx |
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir | ||
---|---|---|
48 | We can now use https://reviews.llvm.org/D152969 |
mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp | ||
---|---|---|
274 ↗ | (On Diff #531523) | actually with the new formulation, you no longer need this since this is proper S(I,J) = SPY( S(I,J) ) x SUM_K A(I,K) B(K,J) * alpha + beta * S(I,J) |
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp | ||
---|---|---|
334–351 | Good point and example! Incorporated |
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp | ||
---|---|---|
335 | You need to do a null check too |
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp | ||
---|---|---|
332 | Add comment // Helper to detect ..... | |
344 | return s_out == use->getOperand(0); ? | |
348 | This seems way too simple. You pretty much match any %1 = sparse_tensor.unary but you don't check that %1 produces A * B and %2 produces sum accumulation It will match the current example, but also much more! Please fix | |
641 | remove this TODO. The current idiom has in-place semantics that will be correctly dealt with by bufferization | |
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-sampled-matmul-lib.mlir | ||
116 | releasing %0 feels cleaner, so that we have the in-place behavior clear |
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp | ||
---|---|---|
335 | You will often find the idiom if (Operation *def = ....getDefiningOp()) { use def } for this purpose, which you can even combine with the isa if (Operation *def getDefiningOp<...> | |
353 | technically, you have not checked if unary feeds into reduce, right? |
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp | ||
---|---|---|
344 | This is checked two times in the loop so I don't find an easy way to simplify the logic |
Add comment
// Helper to detect .....