Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-matmul-2-4-lib.mlir | ||
---|---|---|
18 | This seems a bit copy-and-past from the sparse-mma-2-4-f16.mlir test (which really uses device code for this method by means of e.g. nvgpu.mma.sp.sync). Here, however ,the library calls are still made from the host. So I would remove the while device/host comments here at L17 and at L62). Also, the gpu.container_module is not needed, since no method is defined as gpu.module | |
35 | commented out code? | |
208 | avoid commented out code |
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-matmul-2-4-lib.mlir | ||
---|---|---|
18 | Thanks for all these comments! They are all addressed now |
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-matmul-2-4-lib.mlir | ||
---|---|---|
6 | it looks like this pipeline can be simplified quite a bit, all the gpu.module(....) can go, right? | |
16 | remove gpu.container_module | |
26 | add comment to magic constant here | |
43 | does it work without? in any case, let the TODO jump out a bi t more | |
120 | Copy and paste comment, this is no longer the compressed matrix, but the full 2:4 matrix A | |
146 | empty // line after this comment to seperate it from the CHECK | |
198 | there are no warps in this code, so simply Call the kernel |
it looks like this pipeline can be simplified quite a bit, all the gpu.module(....) can go, right?
Also, the vector to llvm and probably more. Perhaps you can actually get rid of the first mlir-opt call and just start at L7 (bit hard to tell just by looking, but run it by hand and see how far you can strip it)