This revision extends the GPU dialect with ops that can be lowered to
host-oriented sparse matrix library calls (in this case cuSparse focused
although the ops could be generalized to support more GPUs in principle).
This will allow the "sparse compiler pipeline" to accelerate sparse operations
(see follow up revisions with examples of this).
For some background;
https://discourse.llvm.org/t/sparse-compiler-and-gpu-code-generation/69786/2
A question for the reviewers is of course if this is an acceptable dependence (cuSPARSE has been part of CUDA for a long time now) and this is only pulled in when built with the extra flag MLIR_ENABLE_CUDA_RUNNER