Work towards fixing: https://github.com/llvm/llvm-project/issues/51652
Depends On D122928
Differential D122060
[mlir][sparse] Factoring out an enumerator over elements of SparseTensorStorage wrengr on Mar 18 2022, 8:07 PM. Authored by
Details
Diff Detail
Event Timeline
Comment Actions Ah, those were notes-to-self for navigating the file during development. I think after D122061 lands we should split the file up to make navigation easier, though I'm still working out where the best splits would be. Comment Actions Factored out D122928 to address the request for reorganization
Comment Actions Cleaning up how EnumerableSparseTensorStorage delegates to constructors of its base class. Comment Actions Rebasing for D123166. Also removing a bunch of inline keywords, per MLIR style-guide.
Comment Actions Thanks Wren. If the performance results are good, this is getting close to LGTM.
Comment Actions Removed the intermediate class EnumerableSparseTensorStorage<V>. I finally figured out how to reuse SparseTensorStorageBase::getValues in lieu of the EnumerableSparseTensorStorage<V>::getValue method. So I've moved the other two methods into the SparseTensorStorageBase class itself. This loses a little bit of type safety since the SparseTensorEnumerator constructor now has a new type invariant. But it means the SparseTensorStorage class no longer needs to qualify all the inherited methods, which is a considerable amount of cleanup.
Comment Actions Unfortunately the benchmarks have been... annoying. I finished writing them up and ran them at the end of week last week, and they showed <1% slowdown. I was going to post a comment to that effect on monday, but I wanted to rerun them just to be sure— and even though I hadn't touched the code (neither for this CL, nor for the benchmark itself, nor rebasing for the recent upstream changes) suddenly it was showing 2~15% slowdown. Which has undermined my belief in the credibility of the benchmarks. So this week I've been trying to figure out how to improve the reliability of the benchmarks, as well as trying to track down where the slowdown is coming from (assuming it's not spurious). Comment Actions After banging away at things, I seem to have come up with a version that has -4.82~-6.79% slowdown (i.e., 5~7% speedup). I need to check a few more things to make sure these results are actually valid, then I'll upload the new version. Comment Actions Refactoring to minimize overhead (namely splitting the enumerator class up so that we can avoid the cost of virtual-method calls within the loop-nest). Current benchmarks indicate this differential has no statistically significant difference in cpu time compared to the baseline; or on occasion is somewhat faster than the baseline. Also rebasing to incorporate recent changes (D124502, D124875, D124475).
Comment Actions Thanks for your patience during the review, Wren. It has been a long road, but nice to see this new abstraction! |
Note that the //==---- style of separation was only used for comments that introduce a whole new section. This is the first time it is used for just a class comment and it feels a bit out of place. So I would remove L239.