This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] optimize COO index handling
ClosedPublic

Authored by aartbik on Apr 26 2022, 9:32 PM.

Download Raw Diff

Details

Reviewers

bixia
penpornk
wrengr

Commits

rGccd047cba4f1: [mlir][sparse] optimize COO index handling

Summary

By using a shared index pool, we reduce the footprint of each "Element"
in the COO scheme and, in addition, reduce the overhead of allocating
indices (trading many allocations of vectors for allocations in a single
vector only). When the capacity is known, this means *all* allocation
can be done in advance.

This is a big win. For example, reading matrix SK-2005, with dimensions
50,636,154 x 50,636,154 and 1,949,412,601 nonzero elements improves
as follows (time in ms), or about 3.5x faster overall

SK-2005 before        after      speedup
  ---------------------------------------------
read     305,086.65    180,318.12    1.69
sort   2,836,096.23    510,492.87    5.56
pack     364,485.67    312,009.96    1.17
  ---------------------------------------------
TOTAL  3,505,668.56  1,002,820.95    3.50

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aartbik created this revision.Apr 26 2022, 9:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2022, 9:32 PM

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 21 others. · View Herald Transcript

aartbik requested review of this revision.Apr 26 2022, 9:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2022, 9:32 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

aartbik edited the summary of this revision. (Show Details)Apr 26 2022, 9:34 PM

aartbik edited the summary of this revision. (Show Details)

removed old code

aartbik added reviewers: bixia, penpornk, wrengr.Apr 26 2022, 9:41 PM

Harbormaster completed remote builds in B161532: Diff 425414.Apr 26 2022, 10:07 PM

bixia accepted this revision.Apr 27 2022, 9:51 AM

bixia added inline comments.

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
93	Nice change! Can we document this design decision here, using a pointer not a vector?

This revision is now accepted and ready to land.Apr 27 2022, 9:51 AM

aartbik marked an inline comment as done.Apr 27 2022, 10:09 AM

aartbik added inline comments.

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
93	Good idea. I added a note to the struct doc.

added note on change to struct

This revision was landed with ongoing or failed builds.Apr 27 2022, 10:21 AM

Closed by commit rGccd047cba4f1: [mlir][sparse] optimize COO index handling (authored by aartbik). · Explain Why

This revision was automatically updated to reflect the committed changes.

aartbik added a commit: rGccd047cba4f1: [mlir][sparse] optimize COO index handling.

Harbormaster completed remote builds in B161629: Diff 425554.Apr 27 2022, 10:41 AM

wrengr mentioned this in D122060: [mlir][sparse] Factoring out an enumerator over elements of SparseTensorStorage.May 11 2022, 1:09 PM

Revision Contents

Path

Size

mlir/

lib/

ExecutionEngine/

SparseTensorUtils.cpp

62 lines

Diff 425558

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	static inline uint64_t checkedMul(uint64_t lhs, uint64_t rhs) {
return lhs * rhs;		return lhs * rhs;
}		}

/// A sparse tensor element in coordinate scheme (value and indices).		/// A sparse tensor element in coordinate scheme (value and indices).
/// For example, a rank-1 vector element would look like		/// For example, a rank-1 vector element would look like
/// ({i}, a[i])		/// ({i}, a[i])
/// and a rank-5 tensor element like		/// and a rank-5 tensor element like
/// ({i,j,k,l,m}, a[i,j,k,l,m])		/// ({i,j,k,l,m}, a[i,j,k,l,m])
		/// We use pointer to a shared index pool rather than e.g. a direct
		/// vector since that (1) reduces the per-element memory footprint, and
		/// (2) centralizes the memory reservation and (re)allocation to one place.
template <typename V>		template <typename V>
struct Element {		struct Element {
Element(const std::vector<uint64_t> &ind, V val) : indices(ind), value(val){};		Element(uint64_t *ind, V val) : indices(ind), value(val){};
std::vector<uint64_t> indices;		uint64_t *indices; // pointer into shared index pool
		bixiaUnsubmitted Done Reply Inline Actions Nice change! Can we document this design decision here, using a pointer not a vector? bixia: Nice change! Can we document this design decision here, using a pointer not a vector?
		aartbikAuthorUnsubmitted Done Reply Inline Actions Good idea. I added a note to the struct doc. aartbik: Good idea. I added a note to the struct doc.
V value;		V value;
/// Returns true if indices of e1 < indices of e2.
static bool lexOrder(const Element<V> &e1, const Element<V> &e2) {
uint64_t rank = e1.indices.size();
assert(rank == e2.indices.size());
for (uint64_t r = 0; r < rank; r++) {
if (e1.indices[r] == e2.indices[r])
continue;
return e1.indices[r] < e2.indices[r];
}
return false;
}
};		};

/// A memory-resident sparse tensor in coordinate scheme (collection of		/// A memory-resident sparse tensor in coordinate scheme (collection of
/// elements). This data structure is used to read a sparse tensor from		/// elements). This data structure is used to read a sparse tensor from
/// any external format into memory and sort the elements lexicographically		/// any external format into memory and sort the elements lexicographically
/// by indices before passing it back to the client (most packed storage		/// by indices before passing it back to the client (most packed storage
/// formats require the elements to appear in lexicographic index order).		/// formats require the elements to appear in lexicographic index order).
template <typename V>		template <typename V>
struct SparseTensorCOO {		struct SparseTensorCOO {
public:		public:
SparseTensorCOO(const std::vector<uint64_t> &szs, uint64_t capacity)		SparseTensorCOO(const std::vector<uint64_t> &szs, uint64_t capacity)
: sizes(szs) {		: sizes(szs) {
if (capacity)		if (capacity) {
elements.reserve(capacity);		elements.reserve(capacity);
		indices.reserve(capacity * getRank());
}		}
		}

/// Adds element as indices and value.		/// Adds element as indices and value.
void add(const std::vector<uint64_t> &ind, V val) {		void add(const std::vector<uint64_t> &ind, V val) {
assert(!iteratorLocked && "Attempt to add() after startIterator()");		assert(!iteratorLocked && "Attempt to add() after startIterator()");
		uint64_t *base = indices.data();
		uint64_t size = indices.size();
uint64_t rank = getRank();		uint64_t rank = getRank();
assert(rank == ind.size());		assert(rank == ind.size());
for (uint64_t r = 0; r < rank; r++)		for (uint64_t r = 0; r < rank; r++) {
assert(ind[r] < sizes[r]); // within bounds		assert(ind[r] < sizes[r]); // within bounds
elements.emplace_back(ind, val);		indices.push_back(ind[r]);
		}
		// This base only changes if indices were reallocated. In that case, we
		// need to correct all previous pointers into the vector. Note that this
		// only happens if we did not set the initial capacity right, and then only
		// for every internal vector reallocation (which with the doubling rule
		// should only incur an amortized linear overhead).
		uint64_t *new_base = indices.data();
		if (new_base != base) {
		for (uint64_t i = 0, n = elements.size(); i < n; i++)
		elements[i].indices = new_base + (elements[i].indices - base);
		base = new_base;
		}
		// Add element as (pointer into shared index pool, value) pair.
		elements.emplace_back(base + size, val);
}		}

/// Sorts elements lexicographically by index.		/// Sorts elements lexicographically by index.
void sort() {		void sort() {
assert(!iteratorLocked && "Attempt to sort() after startIterator()");		assert(!iteratorLocked && "Attempt to sort() after startIterator()");
// TODO: we may want to cache an `isSorted` bit, to avoid		// TODO: we may want to cache an `isSorted` bit, to avoid
// unnecessary/redundant sorting.		// unnecessary/redundant sorting.
std::sort(elements.begin(), elements.end(), Element<V>::lexOrder);		std::sort(elements.begin(), elements.end(),
		[this](const Element<V> &e1, const Element<V> &e2) {
		uint64_t rank = getRank();
		for (uint64_t r = 0; r < rank; r++) {
		if (e1.indices[r] == e2.indices[r])
		continue;
		return e1.indices[r] < e2.indices[r];
}		}
		return false;
		});
		}

/// Returns rank.		/// Returns rank.
uint64_t getRank() const { return sizes.size(); }		uint64_t getRank() const { return sizes.size(); }

/// Getter for sizes array.		/// Getter for sizes array.
const std::vector<uint64_t> &getSizes() const { return sizes; }		const std::vector<uint64_t> &getSizes() const { return sizes; }

/// Getter for elements array.		/// Getter for elements array.
const std::vector<Element<V>> &getElements() const { return elements; }		const std::vector<Element<V>> &getElements() const { return elements; }

/// Switch into iterator mode.		/// Switch into iterator mode.
void startIterator() {		void startIterator() {
iteratorLocked = true;		iteratorLocked = true;
iteratorPos = 0;		iteratorPos = 0;
}		}

/// Get the next element.		/// Get the next element.
const Element<V> *getNext() {		const Element<V> *getNext() {
assert(iteratorLocked && "Attempt to getNext() before startIterator()");		assert(iteratorLocked && "Attempt to getNext() before startIterator()");
if (iteratorPos < elements.size())		if (iteratorPos < elements.size())
return &(elements[iteratorPos++]);		return &(elements[iteratorPos++]);
iteratorLocked = false;		iteratorLocked = false;
return nullptr;		return nullptr;
}		}
Show All 13 Lines	for (uint64_t r = 0; r < rank; r++) {
assert(sizes[r] > 0 && "Dimension size zero has trivial storage");		assert(sizes[r] > 0 && "Dimension size zero has trivial storage");
permsz[perm[r]] = sizes[r];		permsz[perm[r]] = sizes[r];
}		}
return new SparseTensorCOO<V>(permsz, capacity);		return new SparseTensorCOO<V>(permsz, capacity);
}		}

private:		private:
const std::vector<uint64_t> sizes; // per-dimension sizes		const std::vector<uint64_t> sizes; // per-dimension sizes
std::vector<Element<V>> elements;		std::vector<Element<V>> elements; // all COO elements
		std::vector<uint64_t> indices; // shared index pool
bool iteratorLocked = false;		bool iteratorLocked = false;
unsigned iteratorPos = 0;		unsigned iteratorPos = 0;
};		};

/// Abstract base class for `SparseTensorStorage<P,I,V>`. This class		/// Abstract base class for `SparseTensorStorage<P,I,V>`. This class
/// takes responsibility for all the `<P,I,V>`-independent aspects		/// takes responsibility for all the `<P,I,V>`-independent aspects
/// of the tensor (e.g., shape, sparsity, permutation). In addition,		/// of the tensor (e.g., shape, sparsity, permutation). In addition,
/// we use function overloading to implement "partial" method		/// we use function overloading to implement "partial" method
▲ Show 20 Lines • Show All 1,167 Lines • Show Last 20 Lines