This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/lib/ExecutionEngine/
-
lib/
-
ExecutionEngine/
22/22
SparseTensorUtils.cpp

Differential D122625

[mlir][sparse] Factoring out `finalizeSegment` and (generic) `appendIndex`
ClosedPublic

Authored by wrengr on Mar 28 2022, 5:31 PM.

Download Raw Diff

Details

Reviewers

aartbik
bixia
penpornk

Commits

rG72ec2f76396f: [mlir][sparse] Factoring out `finalizeSegment` and (generic) `appendIndex`

Summary

This change introduces two new methods: finalizeSegment and appendIndex; and removes three old methods: endDim, appendCurrentPointer, appendIndex. The two new methods better encapsulate their algorithms, thus allowing to remove repetitious code in several other places.

Depends On D122435

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wrengr created this revision.Mar 28 2022, 5:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 5:31 PM

Herald added subscribers: sdasgup3, wenzhicui, Chia-hungDuan and 18 others. · View Herald Transcript

wrengr requested review of this revision.Mar 28 2022, 5:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 5:31 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

cleaning up a comment

Harbormaster completed remote builds in B156668: Diff 418746.Mar 30 2022, 8:20 AM

aartbik added inline comments.Mar 30 2022, 1:10 PM

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
434	this should be at the place of endDim, just to make the delta in this review easier to review. typically when refactoring (1) changes, remain in place, only shows delta (2) move code around, in separate CL, no changes to moved code
473	I liked having appendPointer and appendIndex next to each other, but this pushes it way down. Let's keep it at the original place, which also makes the delta for the review a bit easier than when code moves around like this

Rebasing for changes to D122435, and addressing comments

Harbormaster completed remote builds in B157047: Diff 419260.Mar 31 2022, 8:39 AM

wrengr added a child revision: D122928: [mlir][sparse] Moving <P,I,V>-invariant parts of SparseTensorStorage to base.Apr 1 2022, 11:55 AM

aartbik added inline comments.Apr 1 2022, 3:44 PM

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
270	ah, an overflow check, please add that as string at end to make semantics more clear (and other parts of MLIR will break long before this as I found out earlier ;-)
430	I like this generalization with count as parameter to all appending methods. Just checking, I assume that insert(x.end()) and x.push_back() have similar STL performance?
443	This was of course there before, but for uint64_t I, this is a nop check. Shall we do the comparison in even higher precision to detect running out of that space (very unlikely of course, but just observing while here)
493	it would be even shorter if we let appendIndex return the next logical value for full so that this would be full = appendIndex(d, full, i) and have all the compressed/dense logic hidden inside the method. wdyt?
535–542	One of the motivations given for this change is to avoid repeating code. However, in the original code, we only had endDim() pushing zeros into the values array, and I was very used to that invariant. Now both appendIndex and finalizeSegment have this logic. It feels like this part should actuall ycall into appenIndex instead somehow. Do you see a chance to ensure we only add to values in one place?

wrengr marked an inline comment as done.Apr 1 2022, 7:40 PM

wrengr added inline comments.

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
270	Will do. Since I do this check in a couple places over future CLs, I think I'll factor it out into a top-level (inline) function too. Since the division is expensive (and the error condition rare), does mlir have a cpp flag for "I need extra help debugging, so please turn on even more assertions than usual"?
430	I'm not entirely sure? I've found it notoriously difficult to get much information about `stl` implementations, since there are multiple vendors/versions of it. The specification https://en.cppreference.com/w/cpp/container/vector/insert requires this usage to be linear in `count`, but says nothing about the constant factors. I'd imagine in the worst case the overhead when `count=1` is just to set up a for-loop that only runs the once (since `push_back` would also have the same check to prove to itself that the capacity doesn't need resizing). For the cases where `count` is greater than one, this should be a clear win over calling `push_back` within a loop: since it need only check the capacity once rather than on every iteration, and since it might possibly do tricks like the following... Ideally the implementation would use something like `memset`/`memcpy` to handle this, since those are usually optimized for architecture-specific issues re alignment and vectorization. But I'm pretty new to C++, so I'm not sure if a random `stl` vendor could be relied on to do that sort of thing. Once we have some benchmarking tools in place, we can always check to see whether it'd be worth it to roll our own implementation. For this particular function, the `count` generalization is just so `finalizeSegment` doesn't need to repeat the `P`-validity checking logic; whereas the other call-sites in this and future CLs always use the `count=1` default. If `vec.insert(vec.end(), 1, val)` ends up being notably slower than `vec.push_back(val)` then I can always go back to the version of the code where this was inlined into `finalizeSegment`.
443	Hopefully when `I=uint64_t` the assertion will get optimized away entirely (since the compiler should be able to detect that it's always true). Lifting both operands to a higher precision wouldn't change anything, it'd still always be true, just with a few more zeros at front
535–542	`finalizeSegment` only ever pushes a bunch of zeros to `values` and/or calls `appendPointer`, exactly like `endDim` used to. Logically `finalizeSegment` is doing the same thing as `endDim` did, it's just doing so more generally and more efficiently. Re generality: `endDim` could only handle the case of completely empty dimensions (initial `full` is zero), whereas `finalizeSegment` can handle cases where the initial `full` isn't zero, as occurs in `endPath` and the part of `appendIndex` that handles `insPath`. Re efficiency: `endDim` had a bunch of for-loops to repeatedly call the same function (not depending on the induction variable), whereas `finalizeSegment` accumulates those repeated calls into its `count` parameter— so we only call `finalizeSegment` at most `rank` times (rather than at most `rank` depth), and when we hit the basis case all the `count`-many calls to `push_back` get combined into a single `insert` call (thus saving overhead re checking for capacity resizing, improving memory locality, etc). Whereas `appendIndex` will write an arbitrary index to the `indices` array (if compressed), which is a completely different thing. The main change here over the previous version of `appendIndex` is just that now it can also handle the case where the dimension has dense storage: in which case we insert the appropriate number of zero `values`. Previously this situation was handled by `insPath`, but I think it makes more sense as part of `appendIndex` since the zero insertion is logically being done for the same reasons as writing to the `indices` array. As for code duplication, `appendIndex` calls `finalizeSegment` to perform the zero padding (just as previously `insPath` called `endDim` to perform the zero padding). There is only a slight amount of code duplication, namely the one line for handling the `d+1==rank` case. This occurs because I adjusted the inductive hypothesis, so both `finalizeSegment` and `appendIndex` only take in `d` such that `d<rank` (matching the convention of most if not all the other methods); as opposed to the previous situation where `d<=rank` was allowed. While the duplication could be removed by reverting the IH to `d<=rank`, I think the `d<rank` IH is the right way to go. What would it mean for client code to call `finalizeSegment(rank)` ? There is no `rank` dimension, so there are no segments of that dimension to be finalized, so I can't think of anything sensible that it could mean.

Addressing comments

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
493	I like the idea, though I'm not entirely sure of the semantics; i.e., how to document the semantics so as to retain the appropriate encapsulation. I've taken a first pass at it, so let me know if you think it'd be better phrased another way.

aartbik added inline comments.Apr 4 2022, 2:45 PM

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
270	I am not aware of such a flag, but others may know. But here, rather than passing by reference, how about returning the result so we can do sz = checkedMulAssign(sz, sizes[r]); but also more general in a result that is not the same as "lhs". other = checkedMulAssign(sz, sizes[r]);
430	Thanks for checking. I am not overly worried, and we can indeed always profile and improve later.
443	Ah, yes, never mind. I was hoping to somehow detect "about to wrap-around" from i to i+1, but here we already deal with the new index value, so that will not work.
460	this is technically index set [full..full+i], but with space for the new current element right? I also would not say cardinality of i+1, since that is only when we start at 0?
534	please put brackets around the subtraction just for my sanity ;-)

Harbormaster completed remote builds in B157822: Diff 420301.Apr 4 2022, 3:01 PM

addressing comments

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
270	Will do. Though it's worth noting that I phrased it like `operator=` rather than `operator` for a couple reasons: (1) I feel like it helps clarify/allow the `lhs!=0` precondition, whereas for `operator*` I think it'd be more appropriate to check whether `lhs==0` rather than assuming it; (2) it's really just intended for the case where we're accumulating a bunch of things and thus overflow becomes more likely, rather than being a more general solution for detecting overflows elsewhere.
460	Why would it be `[full..full+i]`? If `full` means the number of entries already written, then before calling the method we've already written `full` many entries (for indices `[0..full-1]`); within the method we write another `i-full` many entries (for indices `[full..i-1]`), thus together we've written a total of `[0..full-1]++[full..i-1] == [0..i-1]`. In any case, this weird off-by-one relationship —between the `i` parameter to `appendIndex` vs the return value which becomes the `full` for the next iteration of the while-loop in `fromCOO`— is what I meant when I complained about the semantics being unclear / hard to explain. If `full` really meant the number of entries/zeros written, then we would return `full + (i - full) == i` as the new number of entries/zeros written (which matches the `[0..i-1]` derived above); but we don't, we return `i+1` instead. And if `full` really meant the next unwritten index (since it starts at 0 in the `fromCOO` while-loop, yet we haven't written the 0th index yet), then the compressed case should also return `i+1` (since the `i`th index has now been written); but it doesn't, it returns `full` instead. This inconsistency was already there in the original code, so maybe it's a bug rather than intentional; I was just making sure to retain the previous behavior through this refactoring is all

Re-adjusting the return value of appendIndex.

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
460	Fwiw, I just checked and having the compressed case return `i+1` passes all our tests. Which makes sense since `appendIndex` ignores the `full` for compressed dimensions anyways. Of course, the uniformity of that means that we could just have `full = i + 1;` at the callsite in `fromCOO`, rather than returning anything from this method. Conversely, having the dense case return `i` causes the assertion at the end of `toCOO` to fail for three tests, and six others fail to FileCheck. (All this was expected, since I played around with that before.)

aartbik accepted this revision.Apr 4 2022, 5:08 PM

aartbik added inline comments.

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
270	Yeah, I figured there would be limited use cases but the reference parameter may still surprise the future reader. thanks for the change
460	Ai, I hate it when typos obscure my point ;-). I obviously meant that we wrote out [full, i], as in the original for loop, so I understand that afterwards the full [0..i-1] are written, but one method invocation may write less than that if full was > 0 to start with. But since you reverted back to no return, point is mood.

This revision is now accepted and ready to land.Apr 4 2022, 5:08 PM

Harbormaster completed remote builds in B157861: Diff 420346.Apr 4 2022, 5:24 PM

rebase

wrengr marked 3 inline comments as done.Apr 4 2022, 5:48 PM

Harbormaster completed remote builds in B157873: Diff 420358.Apr 4 2022, 6:16 PM

Closed by commit rG72ec2f76396f: [mlir][sparse] Factoring out `finalizeSegment` and (generic) `appendIndex` (authored by wrengr). · Explain WhyApr 4 2022, 7:11 PM

This revision was automatically updated to reflect the committed changes.

wrengr added a commit: rG72ec2f76396f: [mlir][sparse] Factoring out `finalizeSegment` and (generic) `appendIndex`.

Revision Contents

Path

Size

mlir/

lib/

ExecutionEngine/

SparseTensorUtils.cpp

129 lines

Diff 420340

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
// only visible as an opaque pointer.		// only visible as an opaque pointer.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {

static constexpr int kColWidth = 1025;		static constexpr int kColWidth = 1025;

		/// A version of `operator*` on `uint64_t` which checks for overflows.
		static inline uint64_t checkedMul(uint64_t lhs, uint64_t rhs) {
		assert((lhs == 0 \|\| rhs <= std::numeric_limits<uint64_t>::max() / lhs) &&
		"Integer overflow");
		return lhs * rhs;
		}

/// A sparse tensor element in coordinate scheme (value and indices).		/// A sparse tensor element in coordinate scheme (value and indices).
/// For example, a rank-1 vector element would look like		/// For example, a rank-1 vector element would look like
/// ({i}, a[i])		/// ({i}, a[i])
/// and a rank-5 tensor element like		/// and a rank-5 tensor element like
/// ({i,j,k,l,m}, a[i,j,k,l,m])		/// ({i,j,k,l,m}, a[i,j,k,l,m])
template <typename V>		template <typename V>
struct Element {		struct Element {
Element(const std::vector<uint64_t> &ind, V val) : indices(ind), value(val){};		Element(const std::vector<uint64_t> &ind, V val) : indices(ind), value(val){};
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	SparseTensorStorage(const std::vector<uint64_t> &szs, const uint64_t *perm,
for (uint64_t r = 0; r < rank; r++)		for (uint64_t r = 0; r < rank; r++)
rev[perm[r]] = r;		rev[perm[r]] = r;
// Provide hints on capacity of pointers and indices.		// Provide hints on capacity of pointers and indices.
// TODO: needs fine-tuning based on sparsity		// TODO: needs fine-tuning based on sparsity
bool allDense = true;		bool allDense = true;
uint64_t sz = 1;		uint64_t sz = 1;
for (uint64_t r = 0; r < rank; r++) {		for (uint64_t r = 0; r < rank; r++) {
assert(sizes[r] > 0 && "Dimension size zero has trivial storage");		assert(sizes[r] > 0 && "Dimension size zero has trivial storage");
sz *= sizes[r];		sz = checkedMul(sz, sizes[r]);
		aartbikUnsubmitted Done Reply Inline Actions ah, an overflow check, please add that as string at end to make semantics more clear (and other parts of MLIR will break long before this as I found out earlier ;-) aartbik: ah, an overflow check, please add that as string at end to make semantics more clear (and…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Will do. Since I do this check in a couple places over future CLs, I think I'll factor it out into a top-level (inline) function too. Since the division is expensive (and the error condition rare), does mlir have a cpp flag for "I need extra help debugging, so please turn on even more assertions than usual"? wrengr: Will do. Since I do this check in a couple places over future CLs, I think I'll factor it out…
		aartbikUnsubmitted Done Reply Inline Actions I am not aware of such a flag, but others may know. But here, rather than passing by reference, how about returning the result so we can do sz = checkedMulAssign(sz, sizes[r]); but also more general in a result that is not the same as "lhs". other = checkedMulAssign(sz, sizes[r]); aartbik: I am not aware of such a flag, but others may know. But here, rather than passing by reference…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Will do. Though it's worth noting that I phrased it like `operator=` rather than `operator` for a couple reasons: (1) I feel like it helps clarify/allow the `lhs!=0` precondition, whereas for `operator` I think it'd be more appropriate to check whether `lhs==0` rather than assuming it; (2) it's really just intended for the case where we're accumulating a bunch of things and thus overflow becomes more likely, rather than being a more general solution for detecting overflows elsewhere. wrengr:* Will do. Though it's worth noting that I phrased it like `operator=` rather than `operator`…
		aartbikUnsubmitted Done Reply Inline Actions Yeah, I figured there would be limited use cases but the reference parameter may still surprise the future reader. thanks for the change aartbik: Yeah, I figured there would be limited use cases but the reference parameter may still surprise…
if (sparsity[r] == DimLevelType::kCompressed) {		if (sparsity[r] == DimLevelType::kCompressed) {
pointers[r].reserve(sz + 1);		pointers[r].reserve(sz + 1);
indices[r].reserve(sz);		indices[r].reserve(sz);
sz = 1;		sz = 1;
allDense = false;		allDense = false;
// Prepare the pointer structure. We cannot use `appendPointer`		// Prepare the pointer structure. We cannot use `appendPointer`
// here, because `isCompressedDim` won't work until after this		// here, because `isCompressedDim` won't work until after this
// preparation has been done.		// preparation has been done.
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	for (uint64_t i = 1; i < count; i++) {
values[index] = 0.0;		values[index] = 0.0;
filled[index] = false;		filled[index] = false;
}		}
}		}

/// Finalizes lexicographic insertions.		/// Finalizes lexicographic insertions.
void endInsert() override {		void endInsert() override {
if (values.empty())		if (values.empty())
endDim(0);		finalizeSegment(0);
else		else
endPath(0);		endPath(0);
}		}

/// Returns this sparse tensor storage scheme as a new memory-resident		/// Returns this sparse tensor storage scheme as a new memory-resident
/// sparse tensor in coordinate scheme with the given dimension order.		/// sparse tensor in coordinate scheme with the given dimension order.
SparseTensorCOO<V> toCOO(const uint64_t perm) {		SparseTensorCOO<V> toCOO(const uint64_t perm) {
// Restore original order of the dimension sizes and allocate coordinate		// Restore original order of the dimension sizes and allocate coordinate
Show All 37 Lines	if (tensor) {
permsz[perm[r]] = shape[r];		permsz[perm[r]] = shape[r];
}		}
n = new SparseTensorStorage<P, I, V>(permsz, perm, sparsity);		n = new SparseTensorStorage<P, I, V>(permsz, perm, sparsity);
}		}
return n;		return n;
}		}

private:		private:
/// Appends the next free position of `indices[d]` to `pointers[d]`.		/// Appends an arbitrary new position to `pointers[d]`. This method
/// Thus, when called after inserting the last element of a segment,		/// checks that `pos` is representable in the `P` type; however, it
/// it will append the position where the next segment begins.		/// does not check that `pos` is semantically valid (i.e., larger than
inline void appendPointer(uint64_t d) {		/// the previous position and smaller than `indices[d].capacity()`).
assert(isCompressedDim(d)); // Entails `d < getRank()`.		inline void appendPointer(uint64_t d, uint64_t pos, uint64_t count = 1) {
uint64_t p = indices[d].size();		assert(isCompressedDim(d));
assert(p <= std::numeric_limits<P>::max() &&		assert(pos <= std::numeric_limits<P>::max() &&
"Pointer value is too large for the P-type");		"Pointer value is too large for the P-type");
pointers[d].push_back(p); // Here is where we convert to `P`.		pointers[d].insert(pointers[d].end(), count, static_cast<P>(pos));
		aartbikUnsubmitted Done Reply Inline Actions I like this generalization with count as parameter to all appending methods. Just checking, I assume that insert(x.end()) and x.push_back() have similar STL performance? aartbik: I like this generalization with count as parameter to all appending methods. Just checking, I…
		wrengrAuthorUnsubmitted Done Reply Inline Actions I'm not entirely sure? I've found it notoriously difficult to get much information about `stl` implementations, since there are multiple vendors/versions of it. The specification https://en.cppreference.com/w/cpp/container/vector/insert requires this usage to be linear in `count`, but says nothing about the constant factors. I'd imagine in the worst case the overhead when `count=1` is just to set up a for-loop that only runs the once (since `push_back` would also have the same check to prove to itself that the capacity doesn't need resizing). For the cases where `count` is greater than one, this should be a clear win over calling `push_back` within a loop: since it need only check the capacity once rather than on every iteration, and since it might possibly do tricks like the following... Ideally the implementation would use something like `memset`/`memcpy` to handle this, since those are usually optimized for architecture-specific issues re alignment and vectorization. But I'm pretty new to C++, so I'm not sure if a random `stl` vendor could be relied on to do that sort of thing. Once we have some benchmarking tools in place, we can always check to see whether it'd be worth it to roll our own implementation. For this particular function, the `count` generalization is just so `finalizeSegment` doesn't need to repeat the `P`-validity checking logic; whereas the other call-sites in this and future CLs always use the `count=1` default. If `vec.insert(vec.end(), 1, val)` ends up being notably slower than `vec.push_back(val)` then I can always go back to the version of the code where this was inlined into `finalizeSegment`. wrengr: I'm not entirely sure? I've found it notoriously difficult to get much information about `stl`…
		aartbikUnsubmitted Done Reply Inline Actions Thanks for checking. I am not overly worried, and we can indeed always profile and improve later. aartbik: Thanks for checking. I am not overly worried, and we can indeed always profile and improve…
}		}

/// Appends the given index to `indices[d]`.		/// Appends index `i` to dimension `d`, in the semantically general
inline void appendIndex(uint64_t d, uint64_t i) {		/// sense. For non-dense dimensions, that means appending to the
		aartbikUnsubmitted Done Reply Inline Actions this should be at the place of endDim, just to make the delta in this review easier to review. typically when refactoring (1) changes, remain in place, only shows delta (2) move code around, in separate CL, no changes to moved code aartbik: this should be at the place of endDim, just to make the delta in this review easier to review.
assert(isCompressedDim(d)); // Entails `d < getRank()`.		/// `indices[d]` array, checking that `i` is representable in the `I`
		/// type; however, we do not verify other semantic requirements (e.g.,
		/// that `i` is in bounds for `sizes[d]`, and not previously occurring
		/// in the same segment). For dense dimensions, this method instead
		/// appends the appropriate number of zeros to the `values` array,
		/// where `full` is the number of "entries" already written to `values`
		/// for this segment.
		///
		/// Returns: the new value of `full` for this segment.
		aartbikUnsubmitted Done Reply Inline Actions This was of course there before, but for uint64_t I, this is a nop check. Shall we do the comparison in even higher precision to detect running out of that space (very unlikely of course, but just observing while here) aartbik: This was of course there before, but for uint64_t I, this is a nop check. Shall we do the…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Hopefully when `I=uint64_t` the assertion will get optimized away entirely (since the compiler should be able to detect that it's always true). Lifting both operands to a higher precision wouldn't change anything, it'd still always be true, just with a few more zeros at front wrengr: Hopefully when `I=uint64_t` the assertion will get optimized away entirely (since the compiler…
		aartbikUnsubmitted Done Reply Inline Actions Ah, yes, never mind. I was hoping to somehow detect "about to wrap-around" from i to i+1, but here we already deal with the new index value, so that will not work. aartbik: Ah, yes, never mind. I was hoping to somehow detect "about to wrap-around" from i to i+1, but…
		uint64_t appendIndex(uint64_t d, uint64_t full, uint64_t i) {
		if (isCompressedDim(d)) {
assert(i <= std::numeric_limits<I>::max() &&		assert(i <= std::numeric_limits<I>::max() &&
"Index value is too large for the I-type");		"Index value is too large for the I-type");
indices[d].push_back(i); // Here is where we convert to `I`.		indices[d].push_back(static_cast<I>(i));
		// We wrote nothing to `values`, so `full` remains the same.
		return full;
		}
		// else dense dimension:
		assert(i >= full && "Index was already filled");
		if (i > full) { // Short-circuit the `i==full` case, since it'll be a nop.
		if (d + 1 == getRank())
		values.insert(values.end(), i - full, 0);
		else
		finalizeSegment(d + 1, 0, i - full);
		}
		// We've now written entries for all indices [0..i], and that set
		aartbikUnsubmitted Done Reply Inline Actions this is technically index set [full..full+i], but with space for the new current element right? I also would not say cardinality of i+1, since that is only when we start at 0? aartbik: this is technically index set [full..full+i], but with space for the new current element right?
		wrengrAuthorUnsubmitted Done Reply Inline Actions Why would it be `[full..full+i]`? If `full` means the number of entries already written, then before calling the method we've already written `full` many entries (for indices `[0..full-1]`); within the method we write another `i-full` many entries (for indices `[full..i-1]`), thus together we've written a total of `[0..full-1]++[full..i-1] == [0..i-1]`. In any case, this weird off-by-one relationship —between the `i` parameter to `appendIndex` vs the return value which becomes the `full` for the next iteration of the while-loop in `fromCOO`— is what I meant when I complained about the semantics being unclear / hard to explain. If `full` really meant the number of entries/zeros written, then we would return `full + (i - full) == i` as the new number of entries/zeros written (which matches the `[0..i-1]` derived above); but we don't, we return `i+1` instead. And if `full` really meant the next unwritten index (since it starts at 0 in the `fromCOO` while-loop, yet we haven't written the 0th index yet), then the compressed case should also return `i+1` (since the `i`th index has now been written); but it doesn't, it returns `full` instead. This inconsistency was already there in the original code, so maybe it's a bug rather than intentional; I was just making sure to retain the previous behavior through this refactoring is all wrengr: Why would it be `[full..full+i]`? If `full` means the number of entries already written, then…
		wrengrAuthorUnsubmitted Done Reply Inline Actions Fwiw, I just checked and having the compressed case return `i+1` passes all our tests. Which makes sense since `appendIndex` ignores the `full` for compressed dimensions anyways. Of course, the uniformity of that means that we could just have `full = i + 1;` at the callsite in `fromCOO`, rather than returning anything from this method. Conversely, having the dense case return `i` causes the assertion at the end of `toCOO` to fail for three tests, and six others fail to FileCheck. (All this was expected, since I played around with that before.) wrengr: Fwiw, I just checked and having the compressed case return `i+1` passes all our tests. Which…
		aartbikUnsubmitted Done Reply Inline Actions Ai, I hate it when typos obscure my point ;-). I obviously meant that we wrote out [full, i], as in the original for loop, so I understand that afterwards the full [0..i-1] are written, but one method invocation may write less than that if full was > 0 to start with. But since you reverted back to no return, point is mood. aartbik: Ai, I hate it when typos obscure my point ;-). I obviously meant that we wrote out [full, i]…
		// has cardinality `i+1`.
		return i + 1;
}		}

/// Initializes sparse tensor storage scheme from a memory-resident sparse		/// Initializes sparse tensor storage scheme from a memory-resident sparse
/// tensor in coordinate scheme. This method prepares the pointers and		/// tensor in coordinate scheme. This method prepares the pointers and
/// indices arrays under the given per-dimension dense/sparse annotations.		/// indices arrays under the given per-dimension dense/sparse annotations.
///		///
/// Preconditions:		/// Preconditions:
/// (1) the `elements` must be lexicographically sorted.		/// (1) the `elements` must be lexicographically sorted.
/// (2) the indices of every element are valid for `sizes` (equal rank		/// (2) the indices of every element are valid for `sizes` (equal rank
/// and pointwise less-than).		/// and pointwise less-than).
void fromCOO(const std::vector<Element<V>> &elements, uint64_t lo,		void fromCOO(const std::vector<Element<V>> &elements, uint64_t lo,
		aartbikUnsubmitted Done Reply Inline Actions I liked having appendPointer and appendIndex next to each other, but this pushes it way down. Let's keep it at the original place, which also makes the delta for the review a bit easier than when code moves around like this aartbik: I liked having appendPointer and appendIndex next to each other, but this pushes it way down.
uint64_t hi, uint64_t d) {		uint64_t hi, uint64_t d) {
// Once dimensions are exhausted, insert the numerical values.		// Once dimensions are exhausted, insert the numerical values.
assert(d <= getRank() && hi <= elements.size());		assert(d <= getRank() && hi <= elements.size());
if (d == getRank()) {		if (d == getRank()) {
assert(lo < hi);		assert(lo < hi);
values.push_back(elements[lo].value);		values.push_back(elements[lo].value);
return;		return;
}		}
// Visit all elements in this interval.		// Visit all elements in this interval.
uint64_t full = 0;		uint64_t full = 0;
while (lo < hi) { // If `hi` is unchanged, then `lo < elements.size()`.		while (lo < hi) { // If `hi` is unchanged, then `lo < elements.size()`.
// Find segment in interval with same index elements in this dimension.		// Find segment in interval with same index elements in this dimension.
uint64_t i = elements[lo].indices[d];		uint64_t i = elements[lo].indices[d];
uint64_t seg = lo + 1;		uint64_t seg = lo + 1;
while (seg < hi && elements[seg].indices[d] == i)		while (seg < hi && elements[seg].indices[d] == i)
seg++;		seg++;
// Handle segment in interval for sparse or dense dimension.		// Handle segment in interval for sparse or dense dimension.
if (isCompressedDim(d)) {		full = appendIndex(d, full, i);
appendIndex(d, i);
} else {
// For dense storage we must fill in all the zero values between
// the previous element (when last we ran this for-loop) and the
// current element.
for (; full < i; full++)
endDim(d + 1);
full++;
}
fromCOO(elements, lo, seg, d + 1);		fromCOO(elements, lo, seg, d + 1);
// And move on to next segment in interval.		// And move on to next segment in interval.
		aartbikUnsubmitted Done Reply Inline Actions it would be even shorter if we let appendIndex return the next logical value for full so that this would be full = appendIndex(d, full, i) and have all the compressed/dense logic hidden inside the method. wdyt? aartbik: it would be even shorter if we let appendIndex return the next logical value for full so that…
		wrengrAuthorUnsubmitted Done Reply Inline Actions I like the idea, though I'm not entirely sure of the semantics; i.e., how to document the semantics so as to retain the appropriate encapsulation. I've taken a first pass at it, so let me know if you think it'd be better phrased another way. wrengr: I like the idea, though I'm not entirely sure of the semantics; i.e., how to document the…
lo = seg;		lo = seg;
}		}
// Finalize the sparse pointer structure at this dimension.		// Finalize the sparse pointer structure at this dimension.
if (isCompressedDim(d)) {		finalizeSegment(d, full);
appendPointer(d);
} else {
// For dense storage we must fill in all the zero values after
// the last element.
for (uint64_t sz = sizes[d]; full < sz; full++)
endDim(d + 1);
}
}		}

/// Stores the sparse tensor storage scheme into a memory-resident sparse		/// Stores the sparse tensor storage scheme into a memory-resident sparse
/// tensor in coordinate scheme.		/// tensor in coordinate scheme.
void toCOO(SparseTensorCOO<V> &tensor, std::vector<uint64_t> &reord,		void toCOO(SparseTensorCOO<V> &tensor, std::vector<uint64_t> &reord,
uint64_t pos, uint64_t d) {		uint64_t pos, uint64_t d) {
assert(d <= getRank());		assert(d <= getRank());
if (d == getRank()) {		if (d == getRank()) {
Show All 9 Lines	if (d == getRank()) {
// Dense dimension.		// Dense dimension.
for (uint64_t i = 0, sz = sizes[d], off = pos * sz; i < sz; i++) {		for (uint64_t i = 0, sz = sizes[d], off = pos * sz; i < sz; i++) {
idx[reord[d]] = i;		idx[reord[d]] = i;
toCOO(tensor, reord, off + i, d + 1);		toCOO(tensor, reord, off + i, d + 1);
}		}
}		}
}		}

/// Ends a deeper, never seen before dimension.		/// Finalize the sparse pointer structure at this dimension.
void endDim(uint64_t d) {		void finalizeSegment(uint64_t d, uint64_t full = 0, uint64_t count = 1) {
assert(d <= getRank());		if (count == 0)
if (d == getRank()) {		return; // Short-circuit, since it'll be a nop.
values.push_back(0);		if (isCompressedDim(d)) {
} else if (isCompressedDim(d)) {		appendPointer(d, indices[d].size(), count);
appendPointer(d);		} else { // Dense dimension.
} else {		const uint64_t sz = sizes[d];
for (uint64_t full = 0, sz = sizes[d]; full < sz; full++)		assert(sz >= full && "Segment is overfull");
endDim(d + 1);		// Assuming we checked for overflows in the constructor, then this
		// multiply will never overflow.
		count *= (sz - full);
		aartbikUnsubmitted Done Reply Inline Actions please put brackets around the subtraction just for my sanity ;-) aartbik: please put brackets around the subtraction just for my sanity ;-)
		// For dense storage we must enumerate all the remaining coordinates
		// in this dimension (i.e., coordinates after the last non-zero
		// element), and either fill in their zero values or else recurse
		// to finalize some deeper dimension.
		if (d + 1 == getRank())
		values.insert(values.end(), count, 0);
		else
		finalizeSegment(d + 1, 0, count);
		aartbikUnsubmitted Done Reply Inline Actions One of the motivations given for this change is to avoid repeating code. However, in the original code, we only had endDim() pushing zeros into the values array, and I was very used to that invariant. Now both appendIndex and finalizeSegment have this logic. It feels like this part should actuall ycall into appenIndex instead somehow. Do you see a chance to ensure we only add to values in one place? aartbik: One of the motivations given for this change is to avoid repeating code. However, in the…
		wrengrAuthorUnsubmitted Done Reply Inline Actions `finalizeSegment` only ever pushes a bunch of zeros to `values` and/or calls `appendPointer`, exactly like `endDim` used to. Logically `finalizeSegment` is doing the same thing as `endDim` did, it's just doing so more generally and more efficiently. Re generality: `endDim` could only handle the case of completely empty dimensions (initial `full` is zero), whereas `finalizeSegment` can handle cases where the initial `full` isn't zero, as occurs in `endPath` and the part of `appendIndex` that handles `insPath`. Re efficiency: `endDim` had a bunch of for-loops to repeatedly call the same function (not depending on the induction variable), whereas `finalizeSegment` accumulates those repeated calls into its `count` parameter— so we only call `finalizeSegment` at most `rank` times (rather than at most `rank` depth), and when we hit the basis case all the `count`-many calls to `push_back` get combined into a single `insert` call (thus saving overhead re checking for capacity resizing, improving memory locality, etc). Whereas `appendIndex` will write an arbitrary index to the `indices` array (if compressed), which is a completely different thing. The main change here over the previous version of `appendIndex` is just that now it can also handle the case where the dimension has dense storage: in which case we insert the appropriate number of zero `values`. Previously this situation was handled by `insPath`, but I think it makes more sense as part of `appendIndex` since the zero insertion is logically being done for the same reasons as writing to the `indices` array. As for code duplication, `appendIndex` calls `finalizeSegment` to perform the zero padding (just as previously `insPath` called `endDim` to perform the zero padding). There is only a slight amount of code duplication, namely the one line for handling the `d+1==rank` case. This occurs because I adjusted the inductive hypothesis, so both `finalizeSegment` and `appendIndex` only take in `d` such that `d<rank` (matching the convention of most if not all the other methods); as opposed to the previous situation where `d<=rank` was allowed. While the duplication could be removed by reverting the IH to `d<=rank`, I think the `d<rank` IH is the right way to go. What would it mean for client code to call `finalizeSegment(rank)` ? There is no `rank` dimension, so there are no segments of that dimension to be finalized, so I can't think of anything sensible that it could mean. wrengr: `finalizeSegment` only ever pushes a bunch of zeros to `values` and/or calls `appendPointer`…
}		}
}		}

/// Wraps up a single insertion path, inner to outer.		/// Wraps up a single insertion path, inner to outer.
void endPath(uint64_t diff) {		void endPath(uint64_t diff) {
uint64_t rank = getRank();		uint64_t rank = getRank();
assert(diff <= rank);		assert(diff <= rank);
for (uint64_t i = 0; i < rank - diff; i++) {		for (uint64_t i = 0; i < rank - diff; i++) {
uint64_t d = rank - i - 1;		const uint64_t d = rank - i - 1;
if (isCompressedDim(d)) {		finalizeSegment(d, idx[d] + 1);
appendPointer(d);
} else {
for (uint64_t full = idx[d] + 1, sz = sizes[d]; full < sz; full++)
endDim(d + 1);
}
}		}
}		}

/// Continues a single insertion path, outer to inner.		/// Continues a single insertion path, outer to inner.
void insPath(const uint64_t *cursor, uint64_t diff, uint64_t top, V val) {		void insPath(const uint64_t *cursor, uint64_t diff, uint64_t top, V val) {
uint64_t rank = getRank();		uint64_t rank = getRank();
assert(diff < rank);		assert(diff < rank);
for (uint64_t d = diff; d < rank; d++) {		for (uint64_t d = diff; d < rank; d++) {
uint64_t i = cursor[d];		uint64_t i = cursor[d];
if (isCompressedDim(d)) {		appendIndex(d, top, i);
appendIndex(d, i);
} else {
for (uint64_t full = top; full < i; full++)
endDim(d + 1);
}
top = 0;		top = 0;
idx[d] = i;		idx[d] = i;
}		}
values.push_back(val);		values.push_back(val);
}		}

/// Finds the lexicographic differing dimension.		/// Finds the lexicographic differing dimension.
uint64_t lexDiff(const uint64_t *cursor) const {		uint64_t lexDiff(const uint64_t *cursor) const {
▲ Show 20 Lines • Show All 739 Lines • Show Last 20 Lines