This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse][gpu] a first prototype sparse GPU code generator
ClosedPublic

Authored by aartbik on Apr 3 2023, 4:21 PM.

Details

Summary

This implements a proof-of-concept GPU code generator
to the sparse compiler pipeline, currently only capable
of generating CUDA threads for outermost parallel loops.

The objective, obviously, is to grow this concept
to a full blown GPU code generator, capable of the
right combinaton of code generation as well as exploiting
idiomatic kernels or vector specific libraries (think cuSparse).

Diff Detail

Event Timeline

aartbik created this revision.Apr 3 2023, 4:21 PM
Herald added a project: Restricted Project. · View Herald Transcript
aartbik requested review of this revision.Apr 3 2023, 4:21 PM
Peiming added inline comments.Apr 3 2023, 5:13 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
168

We should be able to support reduction, right? By rewriting parallel reduce => gpu.*_reduce maybe? Maybe it is a future work to you :-)

169–170

Seems that we should be able support this too by tweaking the thread mapping? (though parallal for generated by sparse compile will always use 0 as the lower bound and 1 as the step).

aartbik marked an inline comment as done.Apr 3 2023, 5:18 PM
aartbik added inline comments.
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
168

Agreed. CUDA has some very nifty reduction primitives we should use, but yeah, all future work.

The first step, for now, is to get the pipeline working. This revision CHECK test. After that, an end-to-end test.

Once we have the building blocks more or less in place, the fun starts!

(also, perhaps we don't want a forall rewriter, but bake this directly into loop emitter; I am not sure yet; the idioms, like 2:4 will need work along either path)

169–170

Yeah, agreed. I kept the computation simple (for now),but we really should even be able to support i = lo; i < hi; i+= step) eventually with some more work

ThomasRaoux accepted this revision.Apr 5 2023, 10:16 AM

Looks good to me. It looks like there is more work to get to something that can be performant (memory and block distribution are being inefficient). But that's a great start!

This revision is now accepted and ready to land.Apr 5 2023, 10:16 AM
aartbik marked an inline comment as done.Apr 5 2023, 10:38 AM

Looks good to me. It looks like there is more work to get to something that can be performant (memory and block distribution are being inefficient). But that's a great start!

Absolutely (in fact, some articles point out this is absolutely *not* the right way to make SpMV parallel :-).
But this prototype is incremental, i.e. first get the pipeline up and running, then get an idea on the basic building blocks required, and then go from there!

aartbik updated this revision to Diff 511152.Apr 5 2023, 10:44 AM

rebased with main

This revision was landed with ongoing or failed builds.Apr 5 2023, 11:32 AM
This revision was automatically updated to reflect the committed changes.