This is an archive of the discontinued LLVM Phabricator instance.

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
302	Note that the //==---- style of separation was only used for comments that introduce a whole new section. This is the first time it is used for just a class comment and it feels a bit out of place. So I would remove L239.
308	Since we are moving towards documenting more, perhaps some more information around L174, the base class ,would be in place now too.
310	I like how you place this V-interface in between the typeless base class and the full tensor storage, very elegant!
314–315	Please do not refer to design doc (which is not accessible to outside world most likely and may go out of date). Also, in general, let's just document what we did, not what we could have done ;-)
322	how about making the members rev/sizes etc. part of the base class and making this non-virtual ,inline getters instead. It will require a "super" constructor, of course ,but it would make the part on who owns what a bit more clear
437	if we start using such "end class" comments ,let's do it everywhere

Ah, those were notes-to-self for navigating the file during development. I think after D122061 lands we should split the file up to make navigation easier, though I'm still working out where the best splits would be.

wrengr mentioned this in D122928: [mlir][sparse] Moving <P,I,V>-invariant parts of SparseTensorStorage to base.Apr 1 2022, 11:55 AM

Factored out D122928 to address the request for reorganization

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
308	Any particular documentation you'd like to see there?
314–315	I actually left this note for you :) That is, so that once you got a chance to see the final code we could revisit the designs we talked about and decide which one to go with :)
322	sgtm

Harbormaster completed remote builds in B157476: Diff 419834.Apr 1 2022, 12:47 PM

wrengr edited the summary of this revision. (Show Details)Apr 1 2022, 12:47 PM

wrengr added a parent revision: D122928: [mlir][sparse] Moving <P,I,V>-invariant parts of SparseTensorStorage to base.

wrengr removed a parent revision: D122059: [mlir][sparse] Marking several things const/static.

wrengr added a child revision: D122936: [mlir][sparse] Moved the ElementConsumer typedef to a "type alias".Apr 1 2022, 1:41 PM

Cleaning up how EnumerableSparseTensorStorage delegates to constructors of its base class.

Harbormaster completed remote builds in B157511: Diff 419875.Apr 1 2022, 4:06 PM

rebase

wrengr removed a child revision: D122061: [mlir][sparse] Enhancing sparse=>sparse conversion..Apr 4 2022, 5:20 PM

wrengr added a child revision: D122061: [mlir][sparse] Enhancing sparse=>sparse conversion..Apr 4 2022, 5:20 PM

Harbormaster completed remote builds in B157868: Diff 420353.Apr 4 2022, 5:47 PM

rebase

Harbormaster completed remote builds in B157880: Diff 420368.Apr 4 2022, 7:43 PM

wrengr removed a child revision: D122061: [mlir][sparse] Enhancing sparse=>sparse conversion..Apr 5 2022, 12:27 PM

Rebasing for D123166. Also removing a bunch of inline keywords, per MLIR style-guide.

wrengr added a child revision: D122061: [mlir][sparse] Enhancing sparse=>sparse conversion..Apr 6 2022, 5:44 PM

Harbormaster completed remote builds in B158369: Diff 421051.Apr 6 2022, 5:52 PM

aartbik added inline comments.Apr 8 2022, 10:20 AM

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
455–456	Refresh my C++ knowledge ;-), but do we really need those Base:: qualifiers now? It is a method defined in a superclass, so shouldn't all the magic just work?

wrengr mentioned this in rG8d8b566f0c66: [mlir][sparse] Moving <P,I,V>-invariant parts of SparseTensorStorage to base.Apr 8 2022, 11:44 AM

wrengr added inline comments.Apr 8 2022, 11:51 AM

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
455–456	Alas we do :*( Since the superclass is templated, C++ refuses to do name resolution for superclass methods. https://www.cs.technion.ac.il/users/yechiel/c++-faq/nondependent-name-lookup-members.html

wrengr marked an inline comment as done.Apr 8 2022, 11:51 AM

aartbik added inline comments.Apr 12 2022, 5:52 PM

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
340–349	do we use this constructor currently? if not, then please add later when used
370–373	given that this is not a public facing class, do we need to explicitly delete all of these (I assume you use this so you don't accidentally do the wrong thing)
424	I still find the original code that first restore the original sizes back a bit more intuitive (during debugging, I always check the factory point as entry to see if all is right), and it allows us to use the factory method newSparseTensorCOO exclusively, rather than introducing a direct new here. Given how much overhead you introduce elsewhere, is saving the extra loop really worth it.
551	Given that toCOO() is rather central to a lot of previous measured performance operations, can you please do a before/after measurement with some large tensors (see e.g. our pre-print paper), just to make sure that the use of a ElementConsumer callback does not introduce too much overhead?
720	does this need to be protected, or can it even be private, given that you "friend" this?

Addressing comments, fixing an issue about "slicing", and incorporating D122936

wrengr added inline comments.Apr 13 2022, 3:09 PM

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
370–373	Yep, these are to help prevent doing the wrong thing, since this class captures a reference that could become dangling. IMO it doesn't matter whether the class is public-facing or no; those of us working on the library are only human and thus fallible, so it's still beneficial to get the compiler to help guard against mistakes (especially since there's zero runtime cost to doing so!). Though it does look like it's sufficient to only explicitly delete the copy versions, since then the move versions will implicitly fallback to the defined(-as-deleted) copy versions
424	This part of the reorg isn't about improving performance (though that is a consequence), it's about having the right abstractions. I don't think `newSparseTensorCOO` is a particularly good abstraction. The only thing it does is construct the pushforward array (which is itself a very good abstraction that's also used in several other places), assert non-zero sizes (which can be abstracted into several other places), and then call the constructor. If we factor out the code for constructing pushforward arrays, then `newSparseTensorCOO` is just a macro: `newSparseTensorCOO(rank, sizes, perm, cap) === new SparseTensorCOO(pushforward(rank, perm, sizes), cap)`. Since `SparseTensorEnumerator` must already construct the pushforward array for its own reasons, I fail to see what value `newSparseTensorCOO` adds that would make it worth intentionally duplicating that work. Note that this is rather different than the situation with `newSparseTensor`. Because `newSparseTensor` does relatively unique things like comparing the runtime sizes against the static shape, and in D122061 it's overloaded on the type of the final argument. I'm still not convinced that it's the best abstraction (e.g., because `openSparseTensorCOO` also compares the runtime sizes against the static shape, which suggests there's a better place to draw the abstraction boundary), but it is at the very least a non-trivial abstraction
551	Will do. Can you send me the shell script you used for running those experiments?
720	Yeah, they can be private. I just used protected because that seems more legible to me (oh how I wish there was a way to restrict "friendship" to only these specific methods)

wrengr mentioned this in D122936: [mlir][sparse] Moved the ElementConsumer typedef to a "type alias".Apr 13 2022, 3:09 PM

Harbormaster completed remote builds in B159561: Diff 422674.Apr 13 2022, 3:23 PM

Thanks Wren. If the performance results are good, this is getting close to LGTM.

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
424	Yeah, I am not pushing back against this part of the change per se, I just like the invariant that I had in newSparseTensorCOO during debugging. I can get used to looking for another invariant in the new constructor ;-)
551	I don't think any of my experiments directly apply, since I measured reading tensors. I think you will need to tightly time some of the conversions, in particular around the toCOO method. I am just curious if the function call shows up at all or not. I find auto t_start = std::chrono::steady_clock::now(); ... auto t_end = std::chrono::steady_clock::now(); extremely useful to time tight sections of code, and report very specific timings for a small set of instructions.
720	Ah, I see your point in making it apply to just one section. But yeah, this is find too, less confusion than seeing a protected section.

Removed the intermediate class EnumerableSparseTensorStorage<V>.

I finally figured out how to reuse SparseTensorStorageBase::getValues in lieu of the EnumerableSparseTensorStorage<V>::getValue method. So I've moved the other two methods into the SparseTensorStorageBase class itself. This loses a little bit of type safety since the SparseTensorEnumerator constructor now has a new type invariant. But it means the SparseTensorStorage class no longer needs to qualify all the inherited methods, which is a considerable amount of cleanup.

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
551	Ah, I was hoping it was scripted away rather than requiring such invasive checking. I'll see if I can't find some decently reliable way to test it. (I'm used to using https://github.com/haskell/criterion which handles all manner of complications re proper benchmarking; though I've no idea whether C++ has anything remotely analogous.)

Harbormaster completed remote builds in B159583: Diff 422704.Apr 13 2022, 5:59 PM

rebasing to fix spurious build failure

Harbormaster completed remote builds in B159592: Diff 422716.Apr 13 2022, 7:24 PM

Do you have some experimental validation to report before we proceed with this?

Unfortunately the benchmarks have been... annoying. I finished writing them up and ran them at the end of week last week, and they showed <1% slowdown. I was going to post a comment to that effect on monday, but I wanted to rerun them just to be sure— and even though I hadn't touched the code (neither for this CL, nor for the benchmark itself, nor rebasing for the recent upstream changes) suddenly it was showing 2~15% slowdown. Which has undermined my belief in the credibility of the benchmarks. So this week I've been trying to figure out how to improve the reliability of the benchmarks, as well as trying to track down where the slowdown is coming from (assuming it's not spurious).

After banging away at things, I seem to have come up with a version that has -4.82~-6.79% slowdown (i.e., 5~7% speedup). I need to check a few more things to make sure these results are actually valid, then I'll upload the new version.

Refactoring to minimize overhead (namely splitting the enumerator class up so that we can avoid the cost of virtual-method calls within the loop-nest). Current benchmarks indicate this differential has no statistically significant difference in cpu time compared to the baseline; or on occasion is somewhat faster than the baseline.

Also rebasing to incorporate recent changes (D124502, D124875, D124475).

wrengr marked an inline comment as done.May 11 2022, 1:12 PM

wrengr added inline comments.

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
551	I seem to have finally made the benchmarks report more stable numbers. There's still more variation than I would like, but they reliably show this differential to have the same (or better) performance. Only in rare cases is there any regression, and those cases are still <1%

Harbormaster completed remote builds in B163963: Diff 428751.May 11 2022, 2:03 PM

Rerunning git-clang-format

Harbormaster completed remote builds in B164001: Diff 428797.May 11 2022, 5:42 PM

Thanks for your patience during the review, Wren. It has been a long road, but nice to see this new abstraction!

This revision is now accepted and ready to land.May 12 2022, 12:00 PM

This revision was landed with ongoing or failed builds.May 12 2022, 5:06 PM

Closed by commit rG753fe330c1d6: [mlir][sparse] Factoring out an enumerator over elements of SparseTensorStorage (authored by wrengr). · Explain Why

This revision was automatically updated to reflect the committed changes.

wrengr added a commit: rG753fe330c1d6: [mlir][sparse] Factoring out an enumerator over elements of SparseTensorStorage.

Revision Contents

Path

Size

mlir/

lib/

ExecutionEngine/

SparseTensorUtils.cpp

243 lines

Diff 420368

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp

Show All 21 Lines
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cctype>		#include <cctype>
#include <cinttypes>		#include <cinttypes>
#include <cstdio>		#include <cstdio>
#include <cstdlib>		#include <cstdlib>
#include <cstring>		#include <cstring>
#include <fstream>		#include <fstream>
		#include <functional>
#include <iostream>		#include <iostream>
#include <limits>		#include <limits>
#include <numeric>		#include <numeric>
#include <vector>		#include <vector>

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Internal support for storing and reading sparse tensors.		// Internal support for storing and reading sparse tensors.
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	public:
/// semantic-ordering of dimensions to this object's storage-order.		/// semantic-ordering of dimensions to this object's storage-order.
/// The `szs` and `sparsity` arrays are already in storage-order.		/// The `szs` and `sparsity` arrays are already in storage-order.
///		///
/// Precondition: `perm` and `sparsity` must be valid for `szs.size()`.		/// Precondition: `perm` and `sparsity` must be valid for `szs.size()`.
SparseTensorStorageBase(const std::vector<uint64_t> &szs,		SparseTensorStorageBase(const std::vector<uint64_t> &szs,
const uint64_t perm, const DimLevelType sparsity)		const uint64_t perm, const DimLevelType sparsity)
: dimSizes(szs), rev(getRank()),		: dimSizes(szs), rev(getRank()),
dimTypes(sparsity, sparsity + getRank()) {		dimTypes(sparsity, sparsity + getRank()) {
		assert(perm && sparsity);
const uint64_t rank = getRank();		const uint64_t rank = getRank();
// Validate parameters.		// Validate parameters.
assert(rank > 0 && "Trivial shape is unsupported");		assert(rank > 0 && "Trivial shape is unsupported");
for (uint64_t r = 0; r < rank; r++) {		for (uint64_t r = 0; r < rank; r++) {
assert(dimSizes[r] > 0 && "Dimension size zero has trivial storage");		assert(dimSizes[r] > 0 && "Dimension size zero has trivial storage");
assert((dimTypes[r] == DimLevelType::kDense \|\|		assert((dimTypes[r] == DimLevelType::kDense \|\|
dimTypes[r] == DimLevelType::kCompressed) &&		dimTypes[r] == DimLevelType::kCompressed) &&
"Unsupported DimLevelType");		"Unsupported DimLevelType");
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	static void fatal(const char *tp) {
exit(1);		exit(1);
}		}

const std::vector<uint64_t> dimSizes;		const std::vector<uint64_t> dimSizes;
std::vector<uint64_t> rev;		std::vector<uint64_t> rev;
const std::vector<DimLevelType> dimTypes;		const std::vector<DimLevelType> dimTypes;
};		};

		/// This class provides the interface required by `SparseTensorEnumerator<V>`,
		aartbikUnsubmitted Done Reply Inline Actions Note that the //==---- style of separation was only used for comments that introduce a whole new section. This is the first time it is used for just a class comment and it feels a bit out of place. So I would remove L239. aartbik: Note that the //==---- style of separation was only used for comments that introduce a whole…
		/// and is a superclass of `SparseTensorStorage<P,I,V>` for abstracting
		/// over the `<P,I>` templating. We need that abstraction so that direct
		/// sparse-to-sparse conversion need not take separate `<P,I>` parameters
		/// for the source vs the target (nor require that the source parameters
		/// match the target parameters).
		template <typename V>
		aartbikUnsubmitted Done Reply Inline Actions Since we are moving towards documenting more, perhaps some more information around L174, the base class ,would be in place now too. aartbik: Since we are moving towards documenting more, perhaps some more information around L174, the…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Any particular documentation you'd like to see there? wrengr: Any particular documentation you'd like to see there?
		class EnumerableSparseTensorStorage : public SparseTensorStorageBase {
		public:
		aartbikUnsubmitted Done Reply Inline Actions I like how you place this V-interface in between the typeless base class and the full tensor storage, very elegant! aartbik: I like how you place this V-interface in between the typeless base class and the full tensor…
		using SparseTensorStorageBase::SparseTensorStorageBase;

		/// Looks up the value stored at the given position.
		virtual V getValue(uint64_t pos) const = 0;

		aartbikUnsubmitted Done Reply Inline Actions Please do not refer to design doc (which is not accessible to outside world most likely and may go out of date). Also, in general, let's just document what we did, not what we could have done ;-) aartbik: Please do not refer to design doc (which is not accessible to outside world most likely and may…
		wrengrAuthorUnsubmitted Done Reply Inline Actions I actually left this note for you :) That is, so that once you got a chance to see the final code we could revisit the designs we talked about and decide which one to go with :) wrengr: I actually left this note for you :) That is, so that once you got a chance to see the final…
		/// Looks up the `d`-level position/pointer stored at the given
		/// `d-1`-level position, and converts the stored `P` into `uint64_t`.
		virtual uint64_t getPointer(uint64_t d, uint64_t parentPos) const = 0;

		/// Looks up the `d`-level coordinate/index stored at the given
		/// `d`-level position, and converts the stored `I` into `uint64_t`.
		virtual uint64_t getIndex(uint64_t d, uint64_t pos) const = 0;
		aartbikUnsubmitted Done Reply Inline Actions how about making the members rev/sizes etc. part of the base class and making this non-virtual ,inline getters instead. It will require a "super" constructor, of course ,but it would make the part on who owns what a bit more clear aartbik: how about making the members rev/sizes etc. part of the base class and making this non-virtual…
		wrengrAuthorUnsubmitted Done Reply Inline Actions sgtm wrengr: sgtm
		};

		/// A (higher-order) function object for enumerating the elements of some
		/// `SparseTensorStorage` under a permutation. That is, the `forallElements`
		/// method encapsulates the loop-nest for enumerating the elements of
		/// the source tensor (in whatever order is best for the source tensor),
		/// and applies a permutation to the coordinates/indices before handing
		/// each element to the callback. A single enumerator object can be
		/// freely reused for several calls to `forallElements`, just so long
		/// as each call is sequential with respect to one another.
		///
		/// N.B., this class stores a reference to the `EnumerableSparseTensorStorage`
		/// passed to the constructor; thus, objects of this class must not
		/// outlive the sparse tensor they depend on.
		template <typename V>
		class SparseTensorEnumerator {
		public:
		/// Constructs an enumerator with the identity permutation, thus
		/// enumerating elements with the semantic-ordering of dimensions.
		explicit SparseTensorEnumerator(
		const EnumerableSparseTensorStorage<V> &tensor)
		: src(tensor), permsz(src.getRev().size()), reord(src.getRev()),
		cursor(getRank()) {
		const auto &sizes = src.getDimSizes();
		for (uint64_t rank = getRank(), s = 0; s < rank; s++)
		permsz[reord[s]] = sizes[s];
		}
		aartbikUnsubmitted Done Reply Inline Actions do we use this constructor currently? if not, then please add later when used aartbik: do we use this constructor currently? if not, then please add later when used

		/// Constructs an enumerator with the given permutation for mapping
		/// the semantic-ordering of dimensions to the desired target-ordering.
		///
		/// Precondition: `perm` must be valid for `rank`.
		SparseTensorEnumerator(const EnumerableSparseTensorStorage<V> &tensor,
		uint64_t rank, const uint64_t *perm)
		: src(tensor), permsz(src.getRev().size()), reord(getRank()),
		cursor(getRank()) {
		assert(perm);
		assert(rank == getRank() && "Permutation rank mismatch");
		const auto &rev = src.getRev(); // source stg-order -> semantic-order
		const auto &sizes = src.getDimSizes(); // in source storage-order
		for (uint64_t s = 0; s < rank; s++) { // `s` source storage-order
		uint64_t t = perm[rev[s]]; // `t` target-order
		reord[s] = t;
		permsz[t] = sizes[s];
		}
		}

		SparseTensorEnumerator(const SparseTensorEnumerator &) = delete;
		SparseTensorEnumerator(SparseTensorEnumerator &&) = delete;
		SparseTensorEnumerator &operator=(const SparseTensorEnumerator &) = delete;
		SparseTensorEnumerator &operator=(SparseTensorEnumerator &&) = delete;
		aartbikUnsubmitted Done Reply Inline Actions given that this is not a public facing class, do we need to explicitly delete all of these (I assume you use this so you don't accidentally do the wrong thing) aartbik: given that this is not a public facing class, do we need to explicitly delete all of these (I…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Yep, these are to help prevent doing the wrong thing, since this class captures a reference that could become dangling. IMO it doesn't matter whether the class is public-facing or no; those of us working on the library are only human and thus fallible, so it's still beneficial to get the compiler to help guard against mistakes (especially since there's zero runtime cost to doing so!). Though it does look like it's sufficient to only explicitly delete the copy versions, since then the move versions will implicitly fallback to the defined(-as-deleted) copy versions wrengr: Yep, these are to help prevent doing the wrong thing, since this class captures a reference…

		/// Returns the source/target tensor's rank. (The source-rank and
		/// target-rank are always equal since we only support permutations.
		/// Though once we add support for other dimension mappings, this
		/// method will have to be split in two.)
		inline uint64_t getRank() const { return permsz.size(); }

		/// Returns the target tensor's dimension sizes.
		inline const std::vector<uint64_t> &permutedSizes() const { return permsz; }

		/// The type of callback functions which receive an element (in target
		/// order). We avoid packaging the coordinates and value together
		/// as an `Element` object because this helps keep code somewhat cleaner.
		typedef const std::function<void(const std::vector<uint64_t> &, V)>
		&ElementConsumer;

		/// Enumerates all elements of the source tensor, permutes their
		/// indices, and passes the permuted element to the callback.
		/// The callback must not store the cursor reference directly, since
		/// this function reuses the storage. Instead, the callback must copy
		/// it if they want to keep it.
		inline void forallElements(ElementConsumer yield) {
		forallElements(yield, 0, 0);
		}

		private:
		/// The recursive component of the public `forallElements`.
		void forallElements(ElementConsumer yield, uint64_t parentPos, uint64_t d) {
		if (d == getRank()) {
		// TODO: <https://github.com/llvm/llvm-project/issues/54179>
		yield(cursor, src.getValue(parentPos));
		} else if (src.isCompressedDim(d)) {
		const uint64_t pstart = src.getPointer(d, parentPos);
		const uint64_t pstop = src.getPointer(d, parentPos + 1);
		for (uint64_t pos = pstart; pos < pstop; pos++) {
		cursor[reord[d]] = src.getIndex(d, pos);
		forallElements(yield, pos, d + 1);
		}
		} else { // Dense dimension.
		const uint64_t sz = src.getDimSizes()[d];
		const uint64_t pstart = parentPos * sz;
		for (uint64_t i = 0; i < sz; i++) {
		cursor[reord[d]] = i;
		forallElements(yield, pstart + i, d + 1);
		}
		}
		}

		const EnumerableSparseTensorStorage<V> &src;
		std::vector<uint64_t> permsz; // in target order.
		std::vector<uint64_t> reord; // source storage-order -> target order.
		std::vector<uint64_t> cursor; // in target order.
		};

/// A memory-resident sparse tensor using a storage scheme based on		/// A memory-resident sparse tensor using a storage scheme based on
/// per-dimension sparse/dense annotations. This data structure provides a		/// per-dimension sparse/dense annotations. This data structure provides a
/// bufferized form of a sparse tensor type. In contrast to generating setup		/// bufferized form of a sparse tensor type. In contrast to generating setup
/// methods for each differently annotated sparse tensor, this method provides		/// methods for each differently annotated sparse tensor, this method provides
/// a convenient "one-size-fits-all" solution that simply takes an input tensor		/// a convenient "one-size-fits-all" solution that simply takes an input tensor
/// and annotations to implement all required setup in a general manner.		/// and annotations to implement all required setup in a general manner.
template <typename P, typename I, typename V>		template <typename P, typename I, typename V>
class SparseTensorStorage : public SparseTensorStorageBase {		class SparseTensorStorage : public EnumerableSparseTensorStorage<V> {
		using Base = EnumerableSparseTensorStorage<V>;

		aartbikUnsubmitted Done Reply Inline Actions if we start using such "end class" comments ,let's do it everywhere aartbik: if we start using such "end class" comments ,let's do it everywhere
public:		public:
/// Constructs a sparse tensor storage scheme with the given dimensions,		/// Constructs a sparse tensor storage scheme with the given dimensions,
/// permutation, and per-dimension dense/sparse annotations, using		/// permutation, and per-dimension dense/sparse annotations, using
/// the coordinate scheme tensor for the initial contents if provided.		/// the coordinate scheme tensor for the initial contents if provided.
///		///
/// Precondition: `perm` and `sparsity` must be valid for `szs.size()`.		/// Precondition: `perm` and `sparsity` must be valid for `szs.size()`.
SparseTensorStorage(const std::vector<uint64_t> &szs, const uint64_t *perm,		SparseTensorStorage(const std::vector<uint64_t> &szs, const uint64_t *perm,
const DimLevelType *sparsity,		const DimLevelType *sparsity,
SparseTensorCOO<V> *coo = nullptr)		SparseTensorCOO<V> *coo = nullptr)
: SparseTensorStorageBase(szs, perm, sparsity), pointers(getRank()),		: Base(szs, perm, sparsity), pointers(Base::getRank()),
indices(getRank()), idx(getRank()) {		indices(Base::getRank()), idx(Base::getRank()) {
const uint64_t rank = getRank();		const uint64_t rank = Base::getRank();
// Provide hints on capacity of pointers and indices.		// Provide hints on capacity of pointers and indices.
// TODO: needs fine-tuning based on sparsity		// TODO: needs fine-tuning based on sparsity
bool allDense = true;		bool allDense = true;
uint64_t sz = 1;		uint64_t sz = 1;
for (uint64_t r = 0; r < rank; r++) {		for (uint64_t r = 0; r < rank; r++) {
sz = checkedMul(sz, getDimSizes()[r]);		sz = checkedMul(sz, Base::getDimSizes()[r]);
if (isCompressedDim(r)) {		if (Base::isCompressedDim(r)) {
		aartbikUnsubmitted Done Reply Inline Actions Refresh my C++ knowledge ;-), but do we really need those Base:: qualifiers now? It is a method defined in a superclass, so shouldn't all the magic just work? aartbik: Refresh my C++ knowledge ;-), but do we really need those Base:: qualifiers now? It is a method…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Alas we do :( Since the superclass is templated, C++ refuses to do name resolution for superclass methods. https://www.cs.technion.ac.il/users/yechiel/c++-faq/nondependent-name-lookup-members.html wrengr:* Alas we do :*( Since the superclass is templated, C++ refuses to do name resolution for…
pointers[r].reserve(sz + 1);		pointers[r].reserve(sz + 1);
pointers[r].push_back(0);		pointers[r].push_back(0);
indices[r].reserve(sz);		indices[r].reserve(sz);
sz = 1;		sz = 1;
allDense = false;		allDense = false;
}		}
}		}
// Then assign contents from coordinate scheme tensor if provided.		// Then assign contents from coordinate scheme tensor if provided.
if (coo) {		if (coo) {
// Ensure both preconditions of `fromCOO`.		// Ensure both preconditions of `fromCOO`.
assert(coo->getSizes() == getDimSizes() && "Tensor size mismatch");		assert(coo->getSizes() == Base::getDimSizes() && "Tensor size mismatch");
coo->sort();		coo->sort();
// Now actually insert the `elements`.		// Now actually insert the `elements`.
const std::vector<Element<V>> &elements = coo->getElements();		const std::vector<Element<V>> &elements = coo->getElements();
uint64_t nnz = elements.size();		uint64_t nnz = elements.size();
values.reserve(nnz);		values.reserve(nnz);
fromCOO(elements, 0, nnz, 0);		fromCOO(elements, 0, nnz, 0);
} else if (allDense) {		} else if (allDense) {
values.resize(sz, 0);		values.resize(sz, 0);
}		}
}		}

~SparseTensorStorage() override = default;		~SparseTensorStorage() override = default;

/// Partially specialize these getter methods based on template types.		/// Partially specialize these getter methods based on template types.
void getPointers(std::vector<P> **out, uint64_t d) override {		void getPointers(std::vector<P> **out, uint64_t d) override {
assert(d < getRank());		assert(d < Base::getRank());
*out = &pointers[d];		*out = &pointers[d];
}		}
void getIndices(std::vector<I> **out, uint64_t d) override {		void getIndices(std::vector<I> **out, uint64_t d) override {
assert(d < getRank());		assert(d < Base::getRank());
*out = &indices[d];		*out = &indices[d];
}		}
void getValues(std::vector<V> *out) override { out = &values; }		void getValues(std::vector<V> *out) override { out = &values; }

/// Partially specialize lexicographical insertions based on template types.		/// Partially specialize lexicographical insertions based on template types.
void lexInsert(const uint64_t *cursor, V val) override {		void lexInsert(const uint64_t *cursor, V val) override {
// First, wrap up pending insertion path.		// First, wrap up pending insertion path.
uint64_t diff = 0;		uint64_t diff = 0;
Show All 12 Lines	public:
/// to all-zero/false while only iterating over the nonzero elements.		/// to all-zero/false while only iterating over the nonzero elements.
void expInsert(uint64_t cursor, V values, bool filled, uint64_t added,		void expInsert(uint64_t cursor, V values, bool filled, uint64_t added,
uint64_t count) override {		uint64_t count) override {
if (count == 0)		if (count == 0)
return;		return;
// Sort.		// Sort.
std::sort(added, added + count);		std::sort(added, added + count);
// Restore insertion path for first insert.		// Restore insertion path for first insert.
const uint64_t lastDim = getRank() - 1;		const uint64_t lastDim = Base::getRank() - 1;
uint64_t index = added[0];		uint64_t index = added[0];
cursor[lastDim] = index;		cursor[lastDim] = index;
lexInsert(cursor, values[index]);		lexInsert(cursor, values[index]);
assert(filled[index]);		assert(filled[index]);
values[index] = 0;		values[index] = 0;
filled[index] = false;		filled[index] = false;
// Subsequent insertions are quick.		// Subsequent insertions are quick.
for (uint64_t i = 1; i < count; i++) {		for (uint64_t i = 1; i < count; i++) {
Show All 14 Lines	void endInsert() override {
else		else
endPath(0);		endPath(0);
}		}

/// Returns this sparse tensor storage scheme as a new memory-resident		/// Returns this sparse tensor storage scheme as a new memory-resident
/// sparse tensor in coordinate scheme with the given dimension order.		/// sparse tensor in coordinate scheme with the given dimension order.
///		///
/// Precondition: `perm` must be valid for `getRank()`.		/// Precondition: `perm` must be valid for `getRank()`.
SparseTensorCOO<V> toCOO(const uint64_t perm) {		SparseTensorCOO<V> toCOO(const uint64_t perm) const {
// Restore original order of the dimension sizes and allocate coordinate		SparseTensorEnumerator<V> enumerator(*this, Base::getRank(), perm);
// scheme with desired new ordering specified in perm.		SparseTensorCOO<V> *coo =
const uint64_t rank = getRank();		new SparseTensorCOO<V>(enumerator.permutedSizes(), values.size());
const auto &rev = getRev();		enumerator.forallElements([&coo](const std::vector<uint64_t> &ind, V val) {
		aartbikUnsubmitted Done Reply Inline Actions Given that toCOO() is rather central to a lot of previous measured performance operations, can you please do a before/after measurement with some large tensors (see e.g. our pre-print paper), just to make sure that the use of a ElementConsumer callback does not introduce too much overhead? aartbik: Given that toCOO() is rather central to a lot of previous measured performance operations, can…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Will do. Can you send me the shell script you used for running those experiments? wrengr: Will do. Can you send me the shell script you used for running those experiments?
		aartbikUnsubmitted Done Reply Inline Actions I don't think any of my experiments directly apply, since I measured reading tensors. I think you will need to tightly time some of the conversions, in particular around the toCOO method. I am just curious if the function call shows up at all or not. I find auto t_start = std::chrono::steady_clock::now(); ... auto t_end = std::chrono::steady_clock::now(); extremely useful to time tight sections of code, and report very specific timings for a small set of instructions. aartbik: I don't think any of my experiments directly apply, since I measured reading tensors. I think…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Ah, I was hoping it was scripted away rather than requiring such invasive checking. I'll see if I can't find some decently reliable way to test it. (I'm used to using https://github.com/haskell/criterion which handles all manner of complications re proper benchmarking; though I've no idea whether C++ has anything remotely analogous.) wrengr: Ah, I was hoping it was scripted away rather than requiring such invasive checking. I'll see if…
		wrengrAuthorUnsubmitted Done Reply Inline Actions I seem to have finally made the benchmarks report more stable numbers. There's still more variation than I would like, but they reliably show this differential to have the same (or better) performance. Only in rare cases is there any regression, and those cases are still <1% wrengr: I seem to have finally made the benchmarks report more stable numbers. There's still more…
const auto &sizes = getDimSizes();		coo->add(ind, val);
std::vector<uint64_t> orgsz(rank);		});
for (uint64_t r = 0; r < rank; r++)
aartbikUnsubmitted Done Reply Inline Actions I still find the original code that first restore the original sizes back a bit more intuitive (during debugging, I always check the factory point as entry to see if all is right), and it allows us to use the factory method newSparseTensorCOO exclusively, rather than introducing a direct new here. Given how much overhead you introduce elsewhere, is saving the extra loop really worth it. aartbik: I still find the original code that first restore the original sizes back a bit more intuitive…
wrengrAuthorUnsubmitted Done Reply Inline Actions This part of the reorg isn't about improving performance (though that is a consequence), it's about having the right abstractions. I don't think `newSparseTensorCOO` is a particularly good abstraction. The only thing it does is construct the pushforward array (which is itself a very good abstraction that's also used in several other places), assert non-zero sizes (which can be abstracted into several other places), and then call the constructor. If we factor out the code for constructing pushforward arrays, then `newSparseTensorCOO` is just a macro: `newSparseTensorCOO(rank, sizes, perm, cap) === new SparseTensorCOO(pushforward(rank, perm, sizes), cap)`. Since `SparseTensorEnumerator` must already construct the pushforward array for its own reasons, I fail to see what value `newSparseTensorCOO` adds that would make it worth intentionally duplicating that work. Note that this is rather different than the situation with `newSparseTensor`. Because `newSparseTensor` does relatively unique things like comparing the runtime sizes against the static shape, and in D122061 it's overloaded on the type of the final argument. I'm still not convinced that it's the best abstraction (e.g., because `openSparseTensorCOO` also compares the runtime sizes against the static shape, which suggests there's a better place to draw the abstraction boundary), but it is at the very least a non-trivial abstraction wrengr: This part of the reorg isn't about improving performance (though that is a consequence), it's…
aartbikUnsubmitted Done Reply Inline Actions Yeah, I am not pushing back against this part of the change per se, I just like the invariant that I had in newSparseTensorCOO during debugging. I can get used to looking for another invariant in the new constructor ;-) aartbik: Yeah, I am not pushing back against this part of the change per se, I just like the invariant…
orgsz[rev[r]] = sizes[r];
SparseTensorCOO<V> *coo = SparseTensorCOO<V>::newSparseTensorCOO(
rank, orgsz.data(), perm, values.size());
// Populate coordinate scheme restored from old ordering and changed with
// new ordering. Rather than applying both reorderings during the recursion,
// we compute the combine permutation in advance.
std::vector<uint64_t> reord(rank);
for (uint64_t r = 0; r < rank; r++)
reord[r] = perm[rev[r]];
toCOO(*coo, reord, 0, 0);
// TODO: This assertion assumes there are no stored zeros,		// TODO: This assertion assumes there are no stored zeros,
// or if there are then that we don't filter them out.		// or if there are then that we don't filter them out.
// Cf., <https://github.com/llvm/llvm-project/issues/54179>		// Cf., <https://github.com/llvm/llvm-project/issues/54179>
assert(coo->getElements().size() == values.size());		assert(coo->getElements().size() == values.size());
return coo;		return coo;
}		}

/// Factory method. Constructs a sparse tensor storage scheme with the given		/// Factory method. Constructs a sparse tensor storage scheme with the given
Show All 25 Lines	public:
}		}

private:		private:
/// Appends an arbitrary new position to `pointers[d]`. This method		/// Appends an arbitrary new position to `pointers[d]`. This method
/// checks that `pos` is representable in the `P` type; however, it		/// checks that `pos` is representable in the `P` type; however, it
/// does not check that `pos` is semantically valid (i.e., larger than		/// does not check that `pos` is semantically valid (i.e., larger than
/// the previous position and smaller than `indices[d].capacity()`).		/// the previous position and smaller than `indices[d].capacity()`).
inline void appendPointer(uint64_t d, uint64_t pos, uint64_t count = 1) {		inline void appendPointer(uint64_t d, uint64_t pos, uint64_t count = 1) {
assert(isCompressedDim(d));		assert(Base::isCompressedDim(d));
assert(pos <= std::numeric_limits<P>::max() &&		assert(pos <= std::numeric_limits<P>::max() &&
"Pointer value is too large for the P-type");		"Pointer value is too large for the P-type");
pointers[d].insert(pointers[d].end(), count, static_cast<P>(pos));		pointers[d].insert(pointers[d].end(), count, static_cast<P>(pos));
}		}

/// Appends index `i` to dimension `d`, in the semantically general		/// Appends index `i` to dimension `d`, in the semantically general
/// sense. For non-dense dimensions, that means appending to the		/// sense. For non-dense dimensions, that means appending to the
/// `indices[d]` array, checking that `i` is representable in the `I`		/// `indices[d]` array, checking that `i` is representable in the `I`
/// type; however, we do not verify other semantic requirements (e.g.,		/// type; however, we do not verify other semantic requirements (e.g.,
/// that `i` is in bounds for `sizes[d]`, and not previously occurring		/// that `i` is in bounds for `sizes[d]`, and not previously occurring
/// in the same segment). For dense dimensions, this method instead		/// in the same segment). For dense dimensions, this method instead
/// appends the appropriate number of zeros to the `values` array,		/// appends the appropriate number of zeros to the `values` array,
/// where `full` is the number of "entries" already written to `values`		/// where `full` is the number of "entries" already written to `values`
/// for this segment (aka one after the highest index previously appended).		/// for this segment (aka one after the highest index previously appended).
void appendIndex(uint64_t d, uint64_t full, uint64_t i) {		void appendIndex(uint64_t d, uint64_t full, uint64_t i) {
if (isCompressedDim(d)) {		if (Base::isCompressedDim(d)) {
assert(i <= std::numeric_limits<I>::max() &&		assert(i <= std::numeric_limits<I>::max() &&
"Index value is too large for the I-type");		"Index value is too large for the I-type");
indices[d].push_back(static_cast<I>(i));		indices[d].push_back(static_cast<I>(i));
} else { // Dense dimension.		} else { // Dense dimension.
assert(i >= full && "Index was already filled");		assert(i >= full && "Index was already filled");
if (i == full)		if (i == full)
return; // Short-circuit, since it'll be a nop.		return; // Short-circuit, since it'll be a nop.
if (d + 1 == getRank())		if (d + 1 == Base::getRank())
values.insert(values.end(), i - full, 0);		values.insert(values.end(), i - full, 0);
else		else
finalizeSegment(d + 1, 0, i - full);		finalizeSegment(d + 1, 0, i - full);
}		}
}		}

/// Initializes sparse tensor storage scheme from a memory-resident sparse		/// Initializes sparse tensor storage scheme from a memory-resident sparse
/// tensor in coordinate scheme. This method prepares the pointers and		/// tensor in coordinate scheme. This method prepares the pointers and
/// indices arrays under the given per-dimension dense/sparse annotations.		/// indices arrays under the given per-dimension dense/sparse annotations.
///		///
/// Preconditions:		/// Preconditions:
/// (1) the `elements` must be lexicographically sorted.		/// (1) the `elements` must be lexicographically sorted.
/// (2) the indices of every element are valid for `sizes` (equal rank		/// (2) the indices of every element are valid for `sizes` (equal rank
/// and pointwise less-than).		/// and pointwise less-than).
void fromCOO(const std::vector<Element<V>> &elements, uint64_t lo,		void fromCOO(const std::vector<Element<V>> &elements, uint64_t lo,
uint64_t hi, uint64_t d) {		uint64_t hi, uint64_t d) {
		const uint64_t rank = Base::getRank();
		assert(d <= rank && hi <= elements.size());
// Once dimensions are exhausted, insert the numerical values.		// Once dimensions are exhausted, insert the numerical values.
assert(d <= getRank() && hi <= elements.size());		if (d == rank) {
if (d == getRank()) {
assert(lo < hi);		assert(lo < hi);
values.push_back(elements[lo].value);		values.push_back(elements[lo].value);
return;		return;
}		}
// Visit all elements in this interval.		// Visit all elements in this interval.
uint64_t full = 0;		uint64_t full = 0;
while (lo < hi) { // If `hi` is unchanged, then `lo < elements.size()`.		while (lo < hi) { // If `hi` is unchanged, then `lo < elements.size()`.
// Find segment in interval with same index elements in this dimension.		// Find segment in interval with same index elements in this dimension.
uint64_t i = elements[lo].indices[d];		uint64_t i = elements[lo].indices[d];
uint64_t seg = lo + 1;		uint64_t seg = lo + 1;
while (seg < hi && elements[seg].indices[d] == i)		while (seg < hi && elements[seg].indices[d] == i)
seg++;		seg++;
// Handle segment in interval for sparse or dense dimension.		// Handle segment in interval for sparse or dense dimension.
appendIndex(d, full, i);		appendIndex(d, full, i);
full = i + 1;		full = i + 1;
fromCOO(elements, lo, seg, d + 1);		fromCOO(elements, lo, seg, d + 1);
// And move on to next segment in interval.		// And move on to next segment in interval.
lo = seg;		lo = seg;
}		}
// Finalize the sparse pointer structure at this dimension.		// Finalize the sparse pointer structure at this dimension.
finalizeSegment(d, full);		finalizeSegment(d, full);
}		}

/// Stores the sparse tensor storage scheme into a memory-resident sparse
/// tensor in coordinate scheme.
void toCOO(SparseTensorCOO<V> &tensor, std::vector<uint64_t> &reord,
uint64_t pos, uint64_t d) {
assert(d <= getRank());
if (d == getRank()) {
assert(pos < values.size());
tensor.add(idx, values[pos]);
} else if (isCompressedDim(d)) {
// Sparse dimension.
for (uint64_t ii = pointers[d][pos]; ii < pointers[d][pos + 1]; ii++) {
idx[reord[d]] = indices[d][ii];
toCOO(tensor, reord, ii, d + 1);
}
} else {
// Dense dimension.
const uint64_t sz = getDimSizes()[d];
const uint64_t off = pos * sz;
for (uint64_t i = 0; i < sz; i++) {
idx[reord[d]] = i;
toCOO(tensor, reord, off + i, d + 1);
}
}
}

/// Finalize the sparse pointer structure at this dimension.		/// Finalize the sparse pointer structure at this dimension.
void finalizeSegment(uint64_t d, uint64_t full = 0, uint64_t count = 1) {		void finalizeSegment(uint64_t d, uint64_t full = 0, uint64_t count = 1) {
if (count == 0)		if (count == 0)
return; // Short-circuit, since it'll be a nop.		return; // Short-circuit, since it'll be a nop.
if (isCompressedDim(d)) {		if (Base::isCompressedDim(d)) {
appendPointer(d, indices[d].size(), count);		appendPointer(d, indices[d].size(), count);
} else { // Dense dimension.		} else { // Dense dimension.
const uint64_t sz = getDimSizes()[d];		const uint64_t sz = Base::getDimSizes()[d];
assert(sz >= full && "Segment is overfull");		assert(sz >= full && "Segment is overfull");
// Assuming we checked for overflows in the constructor, then this		// Assuming we checked for overflows in the constructor, then this
// multiply will never overflow.		// multiply will never overflow.
count *= (sz - full);		count *= (sz - full);
// For dense storage we must enumerate all the remaining coordinates		// For dense storage we must enumerate all the remaining coordinates
// in this dimension (i.e., coordinates after the last non-zero		// in this dimension (i.e., coordinates after the last non-zero
// element), and either fill in their zero values or else recurse		// element), and either fill in their zero values or else recurse
// to finalize some deeper dimension.		// to finalize some deeper dimension.
if (d + 1 == getRank())		if (d + 1 == Base::getRank())
values.insert(values.end(), count, 0);		values.insert(values.end(), count, 0);
else		else
finalizeSegment(d + 1, 0, count);		finalizeSegment(d + 1, 0, count);
}		}
}		}

/// Wraps up a single insertion path, inner to outer.		/// Wraps up a single insertion path, inner to outer.
void endPath(uint64_t diff) {		void endPath(uint64_t diff) {
uint64_t rank = getRank();		const uint64_t rank = Base::getRank();
assert(diff <= rank);		assert(diff <= rank);
for (uint64_t i = 0; i < rank - diff; i++) {		for (uint64_t i = 0; i < rank - diff; i++) {
const uint64_t d = rank - i - 1;		const uint64_t d = rank - i - 1;
finalizeSegment(d, idx[d] + 1);		finalizeSegment(d, idx[d] + 1);
}		}
}		}

/// Continues a single insertion path, outer to inner.		/// Continues a single insertion path, outer to inner.
void insPath(const uint64_t *cursor, uint64_t diff, uint64_t top, V val) {		void insPath(const uint64_t *cursor, uint64_t diff, uint64_t top, V val) {
uint64_t rank = getRank();		const uint64_t rank = Base::getRank();
assert(diff < rank);		assert(diff < rank);
for (uint64_t d = diff; d < rank; d++) {		for (uint64_t d = diff; d < rank; d++) {
uint64_t i = cursor[d];		uint64_t i = cursor[d];
appendIndex(d, top, i);		appendIndex(d, top, i);
top = 0;		top = 0;
idx[d] = i;		idx[d] = i;
}		}
values.push_back(val);		values.push_back(val);
}		}

/// Finds the lexicographic differing dimension.		/// Finds the lexicographic differing dimension.
uint64_t lexDiff(const uint64_t *cursor) const {		uint64_t lexDiff(const uint64_t *cursor) const {
for (uint64_t r = 0, rank = getRank(); r < rank; r++)		for (uint64_t r = 0, rank = Base::getRank(); r < rank; r++)
if (cursor[r] > idx[r])		if (cursor[r] > idx[r])
return r;		return r;
else		else
assert(cursor[r] == idx[r] && "non-lexicographic insertion");		assert(cursor[r] == idx[r] && "non-lexicographic insertion");
assert(0 && "duplication insertion");		assert(0 && "duplication insertion");
return -1u;		return -1u;
}		}

		protected:
		aartbikUnsubmitted Done Reply Inline Actions does this need to be protected, or can it even be private, given that you "friend" this? aartbik: does this need to be protected, or can it even be private, given that you "friend" this?
		wrengrAuthorUnsubmitted Done Reply Inline Actions Yeah, they can be private. I just used protected because that seems more legible to me (oh how I wish there was a way to restrict "friendship" to only these specific methods) wrengr: Yeah, they can be private. I just used protected because that seems more legible to me (oh how…
		aartbikUnsubmitted Done Reply Inline Actions Ah, I see your point in making it apply to just one section. But yeah, this is find too, less confusion than seeing a protected section. aartbik: Ah, I see your point in making it apply to just one section. But yeah, this is find too, less…
		// Allow `SparseTensorEnumerator` to access these methods without
		// making them public to other client code.
		friend class SparseTensorEnumerator<V>;
		V getValue(uint64_t pos) const override {
		assert(pos < values.size() && "Value position is out of bounds");
		return values[pos];
		}
		uint64_t getPointer(uint64_t d, uint64_t parentPos) const override {
		assert(Base::isCompressedDim(d)); // Entails `d < getRank()`.
		assert(parentPos < pointers[d].size() &&
		"Pointer position is out of bounds");
		return pointers[d][parentPos]; // Converts the stored `P` into `uint64_t`.
		}
		uint64_t getIndex(uint64_t d, uint64_t pos) const override {
		assert(Base::isCompressedDim(d)); // Entails `d < getRank()`.
		assert(pos < indices[d].size() && "Index position is out of bounds");
		return indices[d][pos]; // Converts the stored `I` into `uint64_t`.
		}

private:		private:
std::vector<std::vector<P>> pointers;		std::vector<std::vector<P>> pointers;
std::vector<std::vector<I>> indices;		std::vector<std::vector<I>> indices;
std::vector<V> values;		std::vector<V> values;
std::vector<uint64_t> idx; // index cursor for lexicographic insertion.		std::vector<uint64_t> idx; // index cursor for lexicographic insertion.
};		};

/// Helper to convert string to lower case.		/// Helper to convert string to lower case.
▲ Show 20 Lines • Show All 714 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] Factoring out an enumerator over elements of SparseTensorStorageClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 420368

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp

[mlir][sparse] Factoring out an enumerator over elements of SparseTensorStorage
ClosedPublic