The host registration is a convenient way to get CUDA kernels
running, but it may be slow and does not work for all buffer
(like global constants). This revision uses the proper alloc
copy dealloc chains for buffers, using asynchronous chains
to increase overlap. The host registration mechanism is
kept under a flag for the output, just for experimentation
purposes while this project ramps up.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp | ||
---|---|---|
132–133 | fixed the comment | |
183 | I agree this was a bit hidden "commented out" code. I added a TODO to investigate this and/or couple this with a compiler option (I really want to keep the code around for fast experimentation, also with our intern starting soon ;-) | |
332 | note that the TODO is here |
This reads wired... maybe remove the first "that"?