This is an archive of the discontinued LLVM Phabricator instance.

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
343	Note that the //==---- style of separation was only used for comments that introduce a whole new section. This is the first time it is used for just a class comment and it feels a bit out of place. So I would remove L239.
349	Since we are moving towards documenting more, perhaps some more information around L174, the base class ,would be in place now too.
351	I like how you place this V-interface in between the typeless base class and the full tensor storage, very elegant!
355–356	Please do not refer to design doc (which is not accessible to outside world most likely and may go out of date). Also, in general, let's just document what we did, not what we could have done ;-)
363	how about making the members rev/sizes etc. part of the base class and making this non-virtual ,inline getters instead. It will require a "super" constructor, of course ,but it would make the part on who owns what a bit more clear
478	if we start using such "end class" comments ,let's do it everywhere

Ah, those were notes-to-self for navigating the file during development. I think after D122061 lands we should split the file up to make navigation easier, though I'm still working out where the best splits would be.

wrengr mentioned this in D122928: [mlir][sparse] Moving <P,I,V>-invariant parts of SparseTensorStorage to base.Apr 1 2022, 11:55 AM

Factored out D122928 to address the request for reorganization

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
349	Any particular documentation you'd like to see there?
355–356	I actually left this note for you :) That is, so that once you got a chance to see the final code we could revisit the designs we talked about and decide which one to go with :)
363	sgtm

Harbormaster completed remote builds in B157476: Diff 419834.Apr 1 2022, 12:47 PM

wrengr edited the summary of this revision. (Show Details)Apr 1 2022, 12:47 PM

wrengr added a parent revision: D122928: [mlir][sparse] Moving <P,I,V>-invariant parts of SparseTensorStorage to base.

wrengr removed a parent revision: D122059: [mlir][sparse] Marking several things const/static.

wrengr added a child revision: D122936: [mlir][sparse] Moved the ElementConsumer typedef to a "type alias".Apr 1 2022, 1:41 PM

Cleaning up how EnumerableSparseTensorStorage delegates to constructors of its base class.

Harbormaster completed remote builds in B157511: Diff 419875.Apr 1 2022, 4:06 PM

rebase

wrengr removed a child revision: D122061: [mlir][sparse] Enhancing sparse=>sparse conversion..Apr 4 2022, 5:20 PM

wrengr added a child revision: D122061: [mlir][sparse] Enhancing sparse=>sparse conversion..Apr 4 2022, 5:20 PM

Harbormaster completed remote builds in B157868: Diff 420353.Apr 4 2022, 5:47 PM

rebase

Harbormaster completed remote builds in B157880: Diff 420368.Apr 4 2022, 7:43 PM

wrengr removed a child revision: D122061: [mlir][sparse] Enhancing sparse=>sparse conversion..Apr 5 2022, 12:27 PM

Rebasing for D123166. Also removing a bunch of inline keywords, per MLIR style-guide.

wrengr added a child revision: D122061: [mlir][sparse] Enhancing sparse=>sparse conversion..Apr 6 2022, 5:44 PM

Harbormaster completed remote builds in B158369: Diff 421051.Apr 6 2022, 5:52 PM

aartbik added inline comments.Apr 8 2022, 10:20 AM

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
374	Refresh my C++ knowledge ;-), but do we really need those Base:: qualifiers now? It is a method defined in a superclass, so shouldn't all the magic just work?

wrengr mentioned this in rG8d8b566f0c66: [mlir][sparse] Moving <P,I,V>-invariant parts of SparseTensorStorage to base.Apr 8 2022, 11:44 AM

wrengr added inline comments.Apr 8 2022, 11:51 AM

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
374	Alas we do :*( Since the superclass is templated, C++ refuses to do name resolution for superclass methods. https://www.cs.technion.ac.il/users/yechiel/c++-faq/nondependent-name-lookup-members.html

wrengr marked an inline comment as done.Apr 8 2022, 11:51 AM

aartbik added inline comments.Apr 12 2022, 5:52 PM

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
381–390	do we use this constructor currently? if not, then please add later when used
411–414	given that this is not a public facing class, do we need to explicitly delete all of these (I assume you use this so you don't accidentally do the wrong thing)
453	I still find the original code that first restore the original sizes back a bit more intuitive (during debugging, I always check the factory point as entry to see if all is right), and it allows us to use the factory method newSparseTensorCOO exclusively, rather than introducing a direct new here. Given how much overhead you introduce elsewhere, is saving the extra loop really worth it.
473	Given that toCOO() is rather central to a lot of previous measured performance operations, can you please do a before/after measurement with some large tensors (see e.g. our pre-print paper), just to make sure that the use of a ElementConsumer callback does not introduce too much overhead?
640	does this need to be protected, or can it even be private, given that you "friend" this?

Addressing comments, fixing an issue about "slicing", and incorporating D122936

wrengr added inline comments.Apr 13 2022, 3:09 PM

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
411–414	Yep, these are to help prevent doing the wrong thing, since this class captures a reference that could become dangling. IMO it doesn't matter whether the class is public-facing or no; those of us working on the library are only human and thus fallible, so it's still beneficial to get the compiler to help guard against mistakes (especially since there's zero runtime cost to doing so!). Though it does look like it's sufficient to only explicitly delete the copy versions, since then the move versions will implicitly fallback to the defined(-as-deleted) copy versions
453	This part of the reorg isn't about improving performance (though that is a consequence), it's about having the right abstractions. I don't think `newSparseTensorCOO` is a particularly good abstraction. The only thing it does is construct the pushforward array (which is itself a very good abstraction that's also used in several other places), assert non-zero sizes (which can be abstracted into several other places), and then call the constructor. If we factor out the code for constructing pushforward arrays, then `newSparseTensorCOO` is just a macro: `newSparseTensorCOO(rank, sizes, perm, cap) === new SparseTensorCOO(pushforward(rank, perm, sizes), cap)`. Since `SparseTensorEnumerator` must already construct the pushforward array for its own reasons, I fail to see what value `newSparseTensorCOO` adds that would make it worth intentionally duplicating that work. Note that this is rather different than the situation with `newSparseTensor`. Because `newSparseTensor` does relatively unique things like comparing the runtime sizes against the static shape, and in D122061 it's overloaded on the type of the final argument. I'm still not convinced that it's the best abstraction (e.g., because `openSparseTensorCOO` also compares the runtime sizes against the static shape, which suggests there's a better place to draw the abstraction boundary), but it is at the very least a non-trivial abstraction
473	Will do. Can you send me the shell script you used for running those experiments?
640	Yeah, they can be private. I just used protected because that seems more legible to me (oh how I wish there was a way to restrict "friendship" to only these specific methods)

wrengr mentioned this in D122936: [mlir][sparse] Moved the ElementConsumer typedef to a "type alias".Apr 13 2022, 3:09 PM

Harbormaster completed remote builds in B159561: Diff 422674.Apr 13 2022, 3:23 PM

Thanks Wren. If the performance results are good, this is getting close to LGTM.

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
453	Yeah, I am not pushing back against this part of the change per se, I just like the invariant that I had in newSparseTensorCOO during debugging. I can get used to looking for another invariant in the new constructor ;-)
473	I don't think any of my experiments directly apply, since I measured reading tensors. I think you will need to tightly time some of the conversions, in particular around the toCOO method. I am just curious if the function call shows up at all or not. I find auto t_start = std::chrono::steady_clock::now(); ... auto t_end = std::chrono::steady_clock::now(); extremely useful to time tight sections of code, and report very specific timings for a small set of instructions.
640	Ah, I see your point in making it apply to just one section. But yeah, this is find too, less confusion than seeing a protected section.

Removed the intermediate class EnumerableSparseTensorStorage<V>.

I finally figured out how to reuse SparseTensorStorageBase::getValues in lieu of the EnumerableSparseTensorStorage<V>::getValue method. So I've moved the other two methods into the SparseTensorStorageBase class itself. This loses a little bit of type safety since the SparseTensorEnumerator constructor now has a new type invariant. But it means the SparseTensorStorage class no longer needs to qualify all the inherited methods, which is a considerable amount of cleanup.

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
473	Ah, I was hoping it was scripted away rather than requiring such invasive checking. I'll see if I can't find some decently reliable way to test it. (I'm used to using https://github.com/haskell/criterion which handles all manner of complications re proper benchmarking; though I've no idea whether C++ has anything remotely analogous.)

Harbormaster completed remote builds in B159583: Diff 422704.Apr 13 2022, 5:59 PM

rebasing to fix spurious build failure

Harbormaster completed remote builds in B159592: Diff 422716.Apr 13 2022, 7:24 PM

Do you have some experimental validation to report before we proceed with this?

Unfortunately the benchmarks have been... annoying. I finished writing them up and ran them at the end of week last week, and they showed <1% slowdown. I was going to post a comment to that effect on monday, but I wanted to rerun them just to be sure— and even though I hadn't touched the code (neither for this CL, nor for the benchmark itself, nor rebasing for the recent upstream changes) suddenly it was showing 2~15% slowdown. Which has undermined my belief in the credibility of the benchmarks. So this week I've been trying to figure out how to improve the reliability of the benchmarks, as well as trying to track down where the slowdown is coming from (assuming it's not spurious).

After banging away at things, I seem to have come up with a version that has -4.82~-6.79% slowdown (i.e., 5~7% speedup). I need to check a few more things to make sure these results are actually valid, then I'll upload the new version.

Refactoring to minimize overhead (namely splitting the enumerator class up so that we can avoid the cost of virtual-method calls within the loop-nest). Current benchmarks indicate this differential has no statistically significant difference in cpu time compared to the baseline; or on occasion is somewhat faster than the baseline.

Also rebasing to incorporate recent changes (D124502, D124875, D124475).

wrengr marked an inline comment as done.May 11 2022, 1:12 PM

wrengr added inline comments.

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
473	I seem to have finally made the benchmarks report more stable numbers. There's still more variation than I would like, but they reliably show this differential to have the same (or better) performance. Only in rare cases is there any regression, and those cases are still <1%

Harbormaster completed remote builds in B163963: Diff 428751.May 11 2022, 2:03 PM

Rerunning git-clang-format

Harbormaster completed remote builds in B164001: Diff 428797.May 11 2022, 5:42 PM

Thanks for your patience during the review, Wren. It has been a long road, but nice to see this new abstraction!

This revision is now accepted and ready to land.May 12 2022, 12:00 PM

This revision was landed with ongoing or failed builds.May 12 2022, 5:06 PM

Closed by commit rG753fe330c1d6: [mlir][sparse] Factoring out an enumerator over elements of SparseTensorStorage (authored by wrengr). · Explain Why

This revision was automatically updated to reflect the committed changes.

wrengr added a commit: rG753fe330c1d6: [mlir][sparse] Factoring out an enumerator over elements of SparseTensorStorage.

Revision Contents

Path

Size

mlir/

lib/

ExecutionEngine/

SparseTensorUtils.cpp

218 lines

Diff 429105

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp

Show All 21 Lines
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cctype>		#include <cctype>
#include <cinttypes>		#include <cinttypes>
#include <cstdio>		#include <cstdio>
#include <cstdlib>		#include <cstdlib>
#include <cstring>		#include <cstring>
#include <fstream>		#include <fstream>
		#include <functional>
#include <iostream>		#include <iostream>
#include <limits>		#include <limits>
#include <numeric>		#include <numeric>
#include <vector>		#include <vector>

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Internal support for storing and reading sparse tensors.		// Internal support for storing and reading sparse tensors.
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
/// (2) centralizes the memory reservation and (re)allocation to one place.		/// (2) centralizes the memory reservation and (re)allocation to one place.
template <typename V>		template <typename V>
struct Element {		struct Element {
Element(uint64_t *ind, V val) : indices(ind), value(val){};		Element(uint64_t *ind, V val) : indices(ind), value(val){};
uint64_t *indices; // pointer into shared index pool		uint64_t *indices; // pointer into shared index pool
V value;		V value;
};		};

		/// The type of callback functions which receive an element. We avoid
		/// packaging the coordinates and value together as an `Element` object
		/// because this helps keep code somewhat cleaner.
		template <typename V>
		using ElementConsumer =
		const std::function<void(const std::vector<uint64_t> &, V)> &;

/// A memory-resident sparse tensor in coordinate scheme (collection of		/// A memory-resident sparse tensor in coordinate scheme (collection of
/// elements). This data structure is used to read a sparse tensor from		/// elements). This data structure is used to read a sparse tensor from
/// any external format into memory and sort the elements lexicographically		/// any external format into memory and sort the elements lexicographically
/// by indices before passing it back to the client (most packed storage		/// by indices before passing it back to the client (most packed storage
/// formats require the elements to appear in lexicographic index order).		/// formats require the elements to appear in lexicographic index order).
template <typename V>		template <typename V>
struct SparseTensorCOO {		struct SparseTensorCOO {
public:		public:
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	public:
/// semantic-ordering of dimensions to this object's storage-order.		/// semantic-ordering of dimensions to this object's storage-order.
/// The `szs` and `sparsity` arrays are already in storage-order.		/// The `szs` and `sparsity` arrays are already in storage-order.
///		///
/// Precondition: `perm` and `sparsity` must be valid for `szs.size()`.		/// Precondition: `perm` and `sparsity` must be valid for `szs.size()`.
SparseTensorStorageBase(const std::vector<uint64_t> &szs,		SparseTensorStorageBase(const std::vector<uint64_t> &szs,
const uint64_t perm, const DimLevelType sparsity)		const uint64_t perm, const DimLevelType sparsity)
: dimSizes(szs), rev(getRank()),		: dimSizes(szs), rev(getRank()),
dimTypes(sparsity, sparsity + getRank()) {		dimTypes(sparsity, sparsity + getRank()) {
		assert(perm && sparsity);
const uint64_t rank = getRank();		const uint64_t rank = getRank();
// Validate parameters.		// Validate parameters.
assert(rank > 0 && "Trivial shape is unsupported");		assert(rank > 0 && "Trivial shape is unsupported");
for (uint64_t r = 0; r < rank; r++) {		for (uint64_t r = 0; r < rank; r++) {
assert(dimSizes[r] > 0 && "Dimension size zero has trivial storage");		assert(dimSizes[r] > 0 && "Dimension size zero has trivial storage");
assert((dimTypes[r] == DimLevelType::kDense \|\|		assert((dimTypes[r] == DimLevelType::kDense \|\|
dimTypes[r] == DimLevelType::kCompressed) &&		dimTypes[r] == DimLevelType::kCompressed) &&
"Unsupported DimLevelType");		"Unsupported DimLevelType");
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	public:
}		}
virtual void expInsert(uint64_t , int8_t , bool , uint64_t , uint64_t) {		virtual void expInsert(uint64_t , int8_t , bool , uint64_t , uint64_t) {
fatal("expi8");		fatal("expi8");
}		}

/// Finishes insertion.		/// Finishes insertion.
virtual void endInsert() = 0;		virtual void endInsert() = 0;

		protected:
		// Since this class is virtual, we must disallow public copying in
		// order to avoid "slicing". Since this class has data members,
		// that means making copying protected.
		// <https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rc-copy-virtual>
		SparseTensorStorageBase(const SparseTensorStorageBase &) = default;
		// Copy-assignment would be implicitly deleted (because `dimSizes`
		// is const), so we explicitly delete it for clarity.
		SparseTensorStorageBase &operator=(const SparseTensorStorageBase &) = delete;

private:		private:
static void fatal(const char *tp) {		static void fatal(const char *tp) {
fprintf(stderr, "unsupported %s\n", tp);		fprintf(stderr, "unsupported %s\n", tp);
exit(1);		exit(1);
}		}

const std::vector<uint64_t> dimSizes;		const std::vector<uint64_t> dimSizes;
std::vector<uint64_t> rev;		std::vector<uint64_t> rev;
const std::vector<DimLevelType> dimTypes;		const std::vector<DimLevelType> dimTypes;
};		};

		// Forward.
		aartbikUnsubmitted Done Reply Inline Actions Note that the //==---- style of separation was only used for comments that introduce a whole new section. This is the first time it is used for just a class comment and it feels a bit out of place. So I would remove L239. aartbik: Note that the //==---- style of separation was only used for comments that introduce a whole…
		template <typename P, typename I, typename V>
		class SparseTensorEnumerator;

/// A memory-resident sparse tensor using a storage scheme based on		/// A memory-resident sparse tensor using a storage scheme based on
/// per-dimension sparse/dense annotations. This data structure provides a		/// per-dimension sparse/dense annotations. This data structure provides a
/// bufferized form of a sparse tensor type. In contrast to generating setup		/// bufferized form of a sparse tensor type. In contrast to generating setup
		aartbikUnsubmitted Done Reply Inline Actions Since we are moving towards documenting more, perhaps some more information around L174, the base class ,would be in place now too. aartbik: Since we are moving towards documenting more, perhaps some more information around L174, the…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Any particular documentation you'd like to see there? wrengr: Any particular documentation you'd like to see there?
/// methods for each differently annotated sparse tensor, this method provides		/// methods for each differently annotated sparse tensor, this method provides
/// a convenient "one-size-fits-all" solution that simply takes an input tensor		/// a convenient "one-size-fits-all" solution that simply takes an input tensor
		aartbikUnsubmitted Done Reply Inline Actions I like how you place this V-interface in between the typeless base class and the full tensor storage, very elegant! aartbik: I like how you place this V-interface in between the typeless base class and the full tensor…
/// and annotations to implement all required setup in a general manner.		/// and annotations to implement all required setup in a general manner.
template <typename P, typename I, typename V>		template <typename P, typename I, typename V>
class SparseTensorStorage : public SparseTensorStorageBase {		class SparseTensorStorage : public SparseTensorStorageBase {
public:		public:
/// Constructs a sparse tensor storage scheme with the given dimensions,		/// Constructs a sparse tensor storage scheme with the given dimensions,
		aartbikUnsubmitted Done Reply Inline Actions Please do not refer to design doc (which is not accessible to outside world most likely and may go out of date). Also, in general, let's just document what we did, not what we could have done ;-) aartbik: Please do not refer to design doc (which is not accessible to outside world most likely and may…
		wrengrAuthorUnsubmitted Done Reply Inline Actions I actually left this note for you :) That is, so that once you got a chance to see the final code we could revisit the designs we talked about and decide which one to go with :) wrengr: I actually left this note for you :) That is, so that once you got a chance to see the final…
/// permutation, and per-dimension dense/sparse annotations, using		/// permutation, and per-dimension dense/sparse annotations, using
/// the coordinate scheme tensor for the initial contents if provided.		/// the coordinate scheme tensor for the initial contents if provided.
///		///
/// Precondition: `perm` and `sparsity` must be valid for `szs.size()`.		/// Precondition: `perm` and `sparsity` must be valid for `szs.size()`.
SparseTensorStorage(const std::vector<uint64_t> &szs, const uint64_t *perm,		SparseTensorStorage(const std::vector<uint64_t> &szs, const uint64_t *perm,
const DimLevelType *sparsity,		const DimLevelType *sparsity,
SparseTensorCOO<V> *coo = nullptr)		SparseTensorCOO<V> *coo = nullptr)
		aartbikUnsubmitted Done Reply Inline Actions how about making the members rev/sizes etc. part of the base class and making this non-virtual ,inline getters instead. It will require a "super" constructor, of course ,but it would make the part on who owns what a bit more clear aartbik: how about making the members rev/sizes etc. part of the base class and making this non-virtual…
		wrengrAuthorUnsubmitted Done Reply Inline Actions sgtm wrengr: sgtm
: SparseTensorStorageBase(szs, perm, sparsity), pointers(getRank()),		: SparseTensorStorageBase(szs, perm, sparsity), pointers(getRank()),
indices(getRank()), idx(getRank()) {		indices(getRank()), idx(getRank()) {
// Provide hints on capacity of pointers and indices.		// Provide hints on capacity of pointers and indices.
// TODO: needs much fine-tuning based on actual sparsity; currently		// TODO: needs much fine-tuning based on actual sparsity; currently
// we reserve pointer/index space based on all previous dense		// we reserve pointer/index space based on all previous dense
// dimensions, which works well up to first sparse dim; but		// dimensions, which works well up to first sparse dim; but
// we should really use nnz and dense/sparse distribution.		// we should really use nnz and dense/sparse distribution.
bool allDense = true;		bool allDense = true;
uint64_t sz = 1;		uint64_t sz = 1;
for (uint64_t r = 0, rank = getRank(); r < rank; r++) {		for (uint64_t r = 0, rank = getRank(); r < rank; r++) {
if (isCompressedDim(r)) {		if (isCompressedDim(r)) {
		aartbikUnsubmitted Done Reply Inline Actions Refresh my C++ knowledge ;-), but do we really need those Base:: qualifiers now? It is a method defined in a superclass, so shouldn't all the magic just work? aartbik: Refresh my C++ knowledge ;-), but do we really need those Base:: qualifiers now? It is a method…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Alas we do :( Since the superclass is templated, C++ refuses to do name resolution for superclass methods. https://www.cs.technion.ac.il/users/yechiel/c++-faq/nondependent-name-lookup-members.html wrengr:* Alas we do :*( Since the superclass is templated, C++ refuses to do name resolution for…
// TODO: Take a parameter between 1 and `sizes[r]`, and multiply		// TODO: Take a parameter between 1 and `sizes[r]`, and multiply
// `sz` by that before reserving. (For now we just use 1.)		// `sz` by that before reserving. (For now we just use 1.)
pointers[r].reserve(sz + 1);		pointers[r].reserve(sz + 1);
pointers[r].push_back(0);		pointers[r].push_back(0);
indices[r].reserve(sz);		indices[r].reserve(sz);
sz = 1;		sz = 1;
allDense = false;		allDense = false;
} else { // Dense dimension.		} else { // Dense dimension.
sz = checkedMul(sz, getDimSizes()[r]);		sz = checkedMul(sz, getDimSizes()[r]);
}		}
}		}
// Then assign contents from coordinate scheme tensor if provided.		// Then assign contents from coordinate scheme tensor if provided.
if (coo) {		if (coo) {
// Ensure both preconditions of `fromCOO`.		// Ensure both preconditions of `fromCOO`.
assert(coo->getSizes() == getDimSizes() && "Tensor size mismatch");		assert(coo->getSizes() == getDimSizes() && "Tensor size mismatch");
coo->sort();		coo->sort();
		aartbikUnsubmitted Done Reply Inline Actions do we use this constructor currently? if not, then please add later when used aartbik: do we use this constructor currently? if not, then please add later when used
// Now actually insert the `elements`.		// Now actually insert the `elements`.
const std::vector<Element<V>> &elements = coo->getElements();		const std::vector<Element<V>> &elements = coo->getElements();
uint64_t nnz = elements.size();		uint64_t nnz = elements.size();
values.reserve(nnz);		values.reserve(nnz);
fromCOO(elements, 0, nnz, 0);		fromCOO(elements, 0, nnz, 0);
} else if (allDense) {		} else if (allDense) {
values.resize(sz, 0);		values.resize(sz, 0);
}		}
}		}

~SparseTensorStorage() override = default;		~SparseTensorStorage() override = default;

/// Partially specialize these getter methods based on template types.		/// Partially specialize these getter methods based on template types.
void getPointers(std::vector<P> **out, uint64_t d) override {		void getPointers(std::vector<P> **out, uint64_t d) override {
assert(d < getRank());		assert(d < getRank());
*out = &pointers[d];		*out = &pointers[d];
}		}
void getIndices(std::vector<I> **out, uint64_t d) override {		void getIndices(std::vector<I> **out, uint64_t d) override {
assert(d < getRank());		assert(d < getRank());
*out = &indices[d];		*out = &indices[d];
}		}
void getValues(std::vector<V> *out) override { out = &values; }		void getValues(std::vector<V> *out) override { out = &values; }

/// Partially specialize lexicographical insertions based on template types.		/// Partially specialize lexicographical insertions based on template types.
		aartbikUnsubmitted Done Reply Inline Actions given that this is not a public facing class, do we need to explicitly delete all of these (I assume you use this so you don't accidentally do the wrong thing) aartbik: given that this is not a public facing class, do we need to explicitly delete all of these (I…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Yep, these are to help prevent doing the wrong thing, since this class captures a reference that could become dangling. IMO it doesn't matter whether the class is public-facing or no; those of us working on the library are only human and thus fallible, so it's still beneficial to get the compiler to help guard against mistakes (especially since there's zero runtime cost to doing so!). Though it does look like it's sufficient to only explicitly delete the copy versions, since then the move versions will implicitly fallback to the defined(-as-deleted) copy versions wrengr: Yep, these are to help prevent doing the wrong thing, since this class captures a reference…
void lexInsert(const uint64_t *cursor, V val) override {		void lexInsert(const uint64_t *cursor, V val) override {
// First, wrap up pending insertion path.		// First, wrap up pending insertion path.
uint64_t diff = 0;		uint64_t diff = 0;
uint64_t top = 0;		uint64_t top = 0;
if (!values.empty()) {		if (!values.empty()) {
diff = lexDiff(cursor);		diff = lexDiff(cursor);
endPath(diff + 1);		endPath(diff + 1);
top = idx[diff] + 1;		top = idx[diff] + 1;
Show All 38 Lines	void endInsert() override {
else		else
endPath(0);		endPath(0);
}		}

/// Returns this sparse tensor storage scheme as a new memory-resident		/// Returns this sparse tensor storage scheme as a new memory-resident
/// sparse tensor in coordinate scheme with the given dimension order.		/// sparse tensor in coordinate scheme with the given dimension order.
///		///
/// Precondition: `perm` must be valid for `getRank()`.		/// Precondition: `perm` must be valid for `getRank()`.
SparseTensorCOO<V> toCOO(const uint64_t perm) {		SparseTensorCOO<V> toCOO(const uint64_t perm) const {
// Restore original order of the dimension sizes and allocate coordinate		SparseTensorEnumerator<P, I, V> enumerator(*this, getRank(), perm);
// scheme with desired new ordering specified in perm.		SparseTensorCOO<V> *coo =
const uint64_t rank = getRank();		new SparseTensorCOO<V>(enumerator.permutedSizes(), values.size());
const auto &rev = getRev();		enumerator.forallElements([&coo](const std::vector<uint64_t> &ind, V val) {
		aartbikUnsubmitted Done Reply Inline Actions Given that toCOO() is rather central to a lot of previous measured performance operations, can you please do a before/after measurement with some large tensors (see e.g. our pre-print paper), just to make sure that the use of a ElementConsumer callback does not introduce too much overhead? aartbik: Given that toCOO() is rather central to a lot of previous measured performance operations, can…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Will do. Can you send me the shell script you used for running those experiments? wrengr: Will do. Can you send me the shell script you used for running those experiments?
		aartbikUnsubmitted Done Reply Inline Actions I don't think any of my experiments directly apply, since I measured reading tensors. I think you will need to tightly time some of the conversions, in particular around the toCOO method. I am just curious if the function call shows up at all or not. I find auto t_start = std::chrono::steady_clock::now(); ... auto t_end = std::chrono::steady_clock::now(); extremely useful to time tight sections of code, and report very specific timings for a small set of instructions. aartbik: I don't think any of my experiments directly apply, since I measured reading tensors. I think…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Ah, I was hoping it was scripted away rather than requiring such invasive checking. I'll see if I can't find some decently reliable way to test it. (I'm used to using https://github.com/haskell/criterion which handles all manner of complications re proper benchmarking; though I've no idea whether C++ has anything remotely analogous.) wrengr: Ah, I was hoping it was scripted away rather than requiring such invasive checking. I'll see if…
		wrengrAuthorUnsubmitted Done Reply Inline Actions I seem to have finally made the benchmarks report more stable numbers. There's still more variation than I would like, but they reliably show this differential to have the same (or better) performance. Only in rare cases is there any regression, and those cases are still <1% wrengr: I seem to have finally made the benchmarks report more stable numbers. There's still more…
const auto &sizes = getDimSizes();		coo->add(ind, val);
std::vector<uint64_t> orgsz(rank);		});
for (uint64_t r = 0; r < rank; r++)
aartbikUnsubmitted Done Reply Inline Actions I still find the original code that first restore the original sizes back a bit more intuitive (during debugging, I always check the factory point as entry to see if all is right), and it allows us to use the factory method newSparseTensorCOO exclusively, rather than introducing a direct new here. Given how much overhead you introduce elsewhere, is saving the extra loop really worth it. aartbik: I still find the original code that first restore the original sizes back a bit more intuitive…
wrengrAuthorUnsubmitted Done Reply Inline Actions This part of the reorg isn't about improving performance (though that is a consequence), it's about having the right abstractions. I don't think `newSparseTensorCOO` is a particularly good abstraction. The only thing it does is construct the pushforward array (which is itself a very good abstraction that's also used in several other places), assert non-zero sizes (which can be abstracted into several other places), and then call the constructor. If we factor out the code for constructing pushforward arrays, then `newSparseTensorCOO` is just a macro: `newSparseTensorCOO(rank, sizes, perm, cap) === new SparseTensorCOO(pushforward(rank, perm, sizes), cap)`. Since `SparseTensorEnumerator` must already construct the pushforward array for its own reasons, I fail to see what value `newSparseTensorCOO` adds that would make it worth intentionally duplicating that work. Note that this is rather different than the situation with `newSparseTensor`. Because `newSparseTensor` does relatively unique things like comparing the runtime sizes against the static shape, and in D122061 it's overloaded on the type of the final argument. I'm still not convinced that it's the best abstraction (e.g., because `openSparseTensorCOO` also compares the runtime sizes against the static shape, which suggests there's a better place to draw the abstraction boundary), but it is at the very least a non-trivial abstraction wrengr: This part of the reorg isn't about improving performance (though that is a consequence), it's…
aartbikUnsubmitted Done Reply Inline Actions Yeah, I am not pushing back against this part of the change per se, I just like the invariant that I had in newSparseTensorCOO during debugging. I can get used to looking for another invariant in the new constructor ;-) aartbik: Yeah, I am not pushing back against this part of the change per se, I just like the invariant…
orgsz[rev[r]] = sizes[r];
SparseTensorCOO<V> *coo = SparseTensorCOO<V>::newSparseTensorCOO(
rank, orgsz.data(), perm, values.size());
// Populate coordinate scheme restored from old ordering and changed with
// new ordering. Rather than applying both reorderings during the recursion,
// we compute the combine permutation in advance.
std::vector<uint64_t> reord(rank);
for (uint64_t r = 0; r < rank; r++)
reord[r] = perm[rev[r]];
toCOO(*coo, reord, 0, 0);
// TODO: This assertion assumes there are no stored zeros,		// TODO: This assertion assumes there are no stored zeros,
// or if there are then that we don't filter them out.		// or if there are then that we don't filter them out.
// Cf., <https://github.com/llvm/llvm-project/issues/54179>		// Cf., <https://github.com/llvm/llvm-project/issues/54179>
		aartbikUnsubmitted Done Reply Inline Actions if we start using such "end class" comments ,let's do it everywhere aartbik: if we start using such "end class" comments ,let's do it everywhere
assert(coo->getElements().size() == values.size());		assert(coo->getElements().size() == values.size());
return coo;		return coo;
}		}

/// Factory method. Constructs a sparse tensor storage scheme with the given		/// Factory method. Constructs a sparse tensor storage scheme with the given
/// dimensions, permutation, and per-dimension dense/sparse annotations,		/// dimensions, permutation, and per-dimension dense/sparse annotations,
/// using the coordinate scheme tensor for the initial contents if provided.		/// using the coordinate scheme tensor for the initial contents if provided.
/// In the latter case, the coordinate scheme must respect the same		/// In the latter case, the coordinate scheme must respect the same
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	private:
/// indices arrays under the given per-dimension dense/sparse annotations.		/// indices arrays under the given per-dimension dense/sparse annotations.
///		///
/// Preconditions:		/// Preconditions:
/// (1) the `elements` must be lexicographically sorted.		/// (1) the `elements` must be lexicographically sorted.
/// (2) the indices of every element are valid for `sizes` (equal rank		/// (2) the indices of every element are valid for `sizes` (equal rank
/// and pointwise less-than).		/// and pointwise less-than).
void fromCOO(const std::vector<Element<V>> &elements, uint64_t lo,		void fromCOO(const std::vector<Element<V>> &elements, uint64_t lo,
uint64_t hi, uint64_t d) {		uint64_t hi, uint64_t d) {
		uint64_t rank = getRank();
		assert(d <= rank && hi <= elements.size());
// Once dimensions are exhausted, insert the numerical values.		// Once dimensions are exhausted, insert the numerical values.
assert(d <= getRank() && hi <= elements.size());		if (d == rank) {
if (d == getRank()) {
assert(lo < hi);		assert(lo < hi);
values.push_back(elements[lo].value);		values.push_back(elements[lo].value);
return;		return;
}		}
// Visit all elements in this interval.		// Visit all elements in this interval.
uint64_t full = 0;		uint64_t full = 0;
while (lo < hi) { // If `hi` is unchanged, then `lo < elements.size()`.		while (lo < hi) { // If `hi` is unchanged, then `lo < elements.size()`.
// Find segment in interval with same index elements in this dimension.		// Find segment in interval with same index elements in this dimension.
uint64_t i = elements[lo].indices[d];		uint64_t i = elements[lo].indices[d];
uint64_t seg = lo + 1;		uint64_t seg = lo + 1;
while (seg < hi && elements[seg].indices[d] == i)		while (seg < hi && elements[seg].indices[d] == i)
seg++;		seg++;
// Handle segment in interval for sparse or dense dimension.		// Handle segment in interval for sparse or dense dimension.
appendIndex(d, full, i);		appendIndex(d, full, i);
full = i + 1;		full = i + 1;
fromCOO(elements, lo, seg, d + 1);		fromCOO(elements, lo, seg, d + 1);
// And move on to next segment in interval.		// And move on to next segment in interval.
lo = seg;		lo = seg;
}		}
// Finalize the sparse pointer structure at this dimension.		// Finalize the sparse pointer structure at this dimension.
finalizeSegment(d, full);		finalizeSegment(d, full);
}		}

/// Stores the sparse tensor storage scheme into a memory-resident sparse
/// tensor in coordinate scheme.
void toCOO(SparseTensorCOO<V> &tensor, std::vector<uint64_t> &reord,
uint64_t pos, uint64_t d) {
assert(d <= getRank());
if (d == getRank()) {
assert(pos < values.size());
tensor.add(idx, values[pos]);
} else if (isCompressedDim(d)) {
// Sparse dimension.
for (uint64_t ii = pointers[d][pos]; ii < pointers[d][pos + 1]; ii++) {
idx[reord[d]] = indices[d][ii];
toCOO(tensor, reord, ii, d + 1);
}
} else {
// Dense dimension.
const uint64_t sz = getDimSizes()[d];
const uint64_t off = pos * sz;
for (uint64_t i = 0; i < sz; i++) {
idx[reord[d]] = i;
toCOO(tensor, reord, off + i, d + 1);
}
}
}

/// Finalize the sparse pointer structure at this dimension.		/// Finalize the sparse pointer structure at this dimension.
void finalizeSegment(uint64_t d, uint64_t full = 0, uint64_t count = 1) {		void finalizeSegment(uint64_t d, uint64_t full = 0, uint64_t count = 1) {
if (count == 0)		if (count == 0)
return; // Short-circuit, since it'll be a nop.		return; // Short-circuit, since it'll be a nop.
if (isCompressedDim(d)) {		if (isCompressedDim(d)) {
appendPointer(d, indices[d].size(), count);		appendPointer(d, indices[d].size(), count);
} else { // Dense dimension.		} else { // Dense dimension.
const uint64_t sz = getDimSizes()[d];		const uint64_t sz = getDimSizes()[d];
Show All 39 Lines	for (uint64_t r = 0, rank = getRank(); r < rank; r++)
if (cursor[r] > idx[r])		if (cursor[r] > idx[r])
return r;		return r;
else		else
assert(cursor[r] == idx[r] && "non-lexicographic insertion");		assert(cursor[r] == idx[r] && "non-lexicographic insertion");
assert(0 && "duplication insertion");		assert(0 && "duplication insertion");
return -1u;		return -1u;
}		}

private:		// Allow `SparseTensorEnumerator` to access the data-members (to avoid
		aartbikUnsubmitted Done Reply Inline Actions does this need to be protected, or can it even be private, given that you "friend" this? aartbik: does this need to be protected, or can it even be private, given that you "friend" this?
		wrengrAuthorUnsubmitted Done Reply Inline Actions Yeah, they can be private. I just used protected because that seems more legible to me (oh how I wish there was a way to restrict "friendship" to only these specific methods) wrengr: Yeah, they can be private. I just used protected because that seems more legible to me (oh how…
		aartbikUnsubmitted Done Reply Inline Actions Ah, I see your point in making it apply to just one section. But yeah, this is find too, less confusion than seeing a protected section. aartbik: Ah, I see your point in making it apply to just one section. But yeah, this is find too, less…
		// the cost of virtual-function dispatch in inner loops), without
		// making them public to other client code.
		friend class SparseTensorEnumerator<P, I, V>;

std::vector<std::vector<P>> pointers;		std::vector<std::vector<P>> pointers;
std::vector<std::vector<I>> indices;		std::vector<std::vector<I>> indices;
std::vector<V> values;		std::vector<V> values;
std::vector<uint64_t> idx; // index cursor for lexicographic insertion.		std::vector<uint64_t> idx; // index cursor for lexicographic insertion.
};		};

		/// A (higher-order) function object for enumerating the elements of some
		/// `SparseTensorStorage` under a permutation. That is, the `forallElements`
		/// method encapsulates the loop-nest for enumerating the elements of
		/// the source tensor (in whatever order is best for the source tensor),
		/// and applies a permutation to the coordinates/indices before handing
		/// each element to the callback. A single enumerator object can be
		/// freely reused for several calls to `forallElements`, just so long
		/// as each call is sequential with respect to one another.
		///
		/// N.B., this class stores a reference to the `SparseTensorStorageBase`
		/// passed to the constructor; thus, objects of this class must not
		/// outlive the sparse tensor they depend on.
		///
		/// Design Note: The reason we define this class instead of simply using
		/// `SparseTensorEnumerator<P,I,V>` is because we need to hide/generalize
		/// the `<P,I>` template parameters from MLIR client code (to simplify the
		/// type parameters used for direct sparse-to-sparse conversion). And the
		/// reason we define the `SparseTensorEnumerator<P,I,V>` subclasses rather
		/// than simply using this class, is to avoid the cost of virtual-method
		/// dispatch within the loop-nest.
		template <typename V>
		class SparseTensorEnumeratorBase {
		public:
		/// Constructs an enumerator with the given permutation for mapping
		/// the semantic-ordering of dimensions to the desired target-ordering.
		///
		/// Preconditions:
		/// * the `tensor` must have the same `V` value type.
		/// * `perm` must be valid for `rank`.
		SparseTensorEnumeratorBase(const SparseTensorStorageBase &tensor,
		uint64_t rank, const uint64_t *perm)
		: src(tensor), permsz(src.getRev().size()), reord(getRank()),
		cursor(getRank()) {
		assert(perm && "Received nullptr for permutation");
		assert(rank == getRank() && "Permutation rank mismatch");
		const auto &rev = src.getRev(); // source stg-order -> semantic-order
		const auto &sizes = src.getDimSizes(); // in source storage-order
		for (uint64_t s = 0; s < rank; s++) { // `s` source storage-order
		uint64_t t = perm[rev[s]]; // `t` target-order
		reord[s] = t;
		permsz[t] = sizes[s];
		}
		}

		virtual ~SparseTensorEnumeratorBase() = default;

		// We disallow copying to help avoid leaking the `src` reference.
		// (In addition to avoiding the problem of slicing.)
		SparseTensorEnumeratorBase(const SparseTensorEnumeratorBase &) = delete;
		SparseTensorEnumeratorBase &
		operator=(const SparseTensorEnumeratorBase &) = delete;

		/// Returns the source/target tensor's rank. (The source-rank and
		/// target-rank are always equal since we only support permutations.
		/// Though once we add support for other dimension mappings, this
		/// method will have to be split in two.)
		uint64_t getRank() const { return permsz.size(); }

		/// Returns the target tensor's dimension sizes.
		const std::vector<uint64_t> &permutedSizes() const { return permsz; }

		/// Enumerates all elements of the source tensor, permutes their
		/// indices, and passes the permuted element to the callback.
		/// The callback must not store the cursor reference directly,
		/// since this function reuses the storage. Instead, the callback
		/// must copy it if they want to keep it.
		virtual void forallElements(ElementConsumer<V> yield) = 0;

		protected:
		const SparseTensorStorageBase &src;
		std::vector<uint64_t> permsz; // in target order.
		std::vector<uint64_t> reord; // source storage-order -> target order.
		std::vector<uint64_t> cursor; // in target order.
		};

		template <typename P, typename I, typename V>
		class SparseTensorEnumerator final : public SparseTensorEnumeratorBase<V> {
		using Base = SparseTensorEnumeratorBase<V>;

		public:
		/// Constructs an enumerator with the given permutation for mapping
		/// the semantic-ordering of dimensions to the desired target-ordering.
		///
		/// Precondition: `perm` must be valid for `rank`.
		SparseTensorEnumerator(const SparseTensorStorage<P, I, V> &tensor,
		uint64_t rank, const uint64_t *perm)
		: Base(tensor, rank, perm) {}

		~SparseTensorEnumerator() final override = default;

		void forallElements(ElementConsumer<V> yield) final override {
		forallElements(yield, 0, 0);
		}

		private:
		/// The recursive component of the public `forallElements`.
		void forallElements(ElementConsumer<V> yield, uint64_t parentPos,
		uint64_t d) {
		// Recover the `<P,I,V>` type parameters of `src`.
		const auto &src =
		static_cast<const SparseTensorStorage<P, I, V> &>(this->src);
		if (d == Base::getRank()) {
		assert(parentPos < src.values.size() &&
		"Value position is out of bounds");
		// TODO: <https://github.com/llvm/llvm-project/issues/54179>
		yield(this->cursor, src.values[parentPos]);
		} else if (src.isCompressedDim(d)) {
		// Look up the bounds of the `d`-level segment determined by the
		// `d-1`-level position `parentPos`.
		const std::vector<P> &pointers_d = src.pointers[d];
		assert(parentPos + 1 < pointers_d.size() &&
		"Parent pointer position is out of bounds");
		const uint64_t pstart = static_cast<uint64_t>(pointers_d[parentPos]);
		const uint64_t pstop = static_cast<uint64_t>(pointers_d[parentPos + 1]);
		// Loop-invariant code for looking up the `d`-level coordinates/indices.
		const std::vector<I> &indices_d = src.indices[d];
		assert(pstop - 1 < indices_d.size() && "Index position is out of bounds");
		uint64_t &cursor_reord_d = this->cursor[this->reord[d]];
		for (uint64_t pos = pstart; pos < pstop; pos++) {
		cursor_reord_d = static_cast<uint64_t>(indices_d[pos]);
		forallElements(yield, pos, d + 1);
		}
		} else { // Dense dimension.
		const uint64_t sz = src.getDimSizes()[d];
		const uint64_t pstart = parentPos * sz;
		uint64_t &cursor_reord_d = this->cursor[this->reord[d]];
		for (uint64_t i = 0; i < sz; i++) {
		cursor_reord_d = i;
		forallElements(yield, pstart + i, d + 1);
		}
		}
		}
		};

/// Helper to convert string to lower case.		/// Helper to convert string to lower case.
static char toLower(char token) {		static char toLower(char token) {
for (char c = token; c; c++)		for (char c = token; c; c++)
c = tolower(c);		c = tolower(c);
return token;		return token;
}		}

/// Read the MME header of a general sparse matrix of type real.		/// Read the MME header of a general sparse matrix of type real.
▲ Show 20 Lines • Show All 754 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] Factoring out an enumerator over elements of SparseTensorStorageClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 429105

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp

[mlir][sparse] Factoring out an enumerator over elements of SparseTensorStorage
ClosedPublic