The sparse compiler now has two prototype strategies for GPU acceleration:
- CUDA codegen: this converts sparsified code to CUDA threads
- CUDA libgen: this converts pre-sparsified code to cuSPARSE library calls
This revision introduces the first steps required for the second approach.
is this comment up to date? It does look like you are generating GPU dialect ops?