This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] Replace Element with ElementId for sorting.
Needs ReviewPublic

Authored by bixia on Mar 27 2023, 4:12 PM.

Details

Reviewers
wrengr
aartbik
Summary

Previously, we represented a COO element with a pointer to the coordinates and a
value. We use this representing for sorting. Such a representation
unnecessarily involves the element in the compare function as well as in the
swap function for sorting.

We now replace this Element representation with ElementId, which is the index
of the element value in the values-array.

This improves read-CSC-arabic2005-lib by more than 10%.

Diff Detail

Event Timeline

bixia created this revision.Mar 27 2023, 4:12 PM
Herald added a project: Restricted Project. · View Herald Transcript
bixia requested review of this revision.Mar 27 2023, 4:12 PM

Nit: in the CL summary it should be "Previously, we represented a"

mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
31

"at indices"

49–50

I think it would be nicer to keep the Element class around, since it's still helpful to have a struct that gives better names for things than std::pair<ElementId, V> does. Although the new Element class can't give the user the uint64_t const* coords pointer, without also storing a reference to the enclosing SparseTensorCOO, that's not too bad of a problem since the old Element class already had issues because it did not store the rank. (And since the ElementConsumer already unpacks the Element class for its arguments, it would be easy enough to change the new Element class to store that reference.)

53–54

This should be named coordinates since it's not just the set of coordinates for a single element, which is why that's the named used by SparseTensorCOO as well. (This is different than the coords pointer of the Element class, which is indeed the set of coordinates for that single element.)

59–61

Much as I love consting all the things, unfortunately MLIR style says that scalar function parameters shouldn't use const (only pointers, references, etc may use it).

63–65

Since e1, e2, and rank are all constant, it would be better to define uint64_t const* const coords1 = coordinates + e1 * rank; uint64_t const* const coords2 = coordinates + e2 * rank; and then use if (coords1[d] == coords2[d]) etc. Although this may help performance due to LICM, the real reason I suggest it is that it aids code clarity by removing redundancies. Moreover, it helps ensure correctness despite making the parameters non-const as per my other comment.

(N.B., for this change, the two local variables are called "coords" because they're the collection of coordinates for a single element.)

(N.B., the two consts are important there. The the const* one is required by the fact that ElementLT::coordinates is const*. Whereas the const coordsN one is what ensures correctness: by making the coordsN local variable immutable itself. Personally I prefer the "East const style" https://isocpp.org/wiki/faq/const-correctness#const-ref-alt because it's more consistant/logical to parse; but since MLIR uses "West const style" everywhere else, feel free to use "const uint64_t * const coordsN" instead if you prefer.)

wrengr added inline comments.Mar 28 2023, 4:30 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
70

as mentioned earlier, this field should be named "coordinates"

111

This change needs to be reverted.

The value_type alias is part of the "Container" concept which is used by the C++-style iterator interface requested by Peter Gavin. The SparseTensorCOO class is not a container of ids, it's a container of elements, and therefore should be iterated as such.

113–114

This change needs to be reverted for the same reason.

116–167

The blank line should remain.

wrengr added inline comments.Mar 28 2023, 5:03 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
49–50

See my other followup comment, re preserving the Element<V> class for the iterator interface of SparseTensorCOO.

111

As a followup to this, since the Element<V> class is needed by the iterator interface and since clients can't convert an ElementId into a uint64_t const* coords pointer, that means we can't have the new Element<V> class just store an ElementId in lieu of the old pointer. I see two obvious solutions:

(1) Keep the Element<V> class exactly as it was before, and adjust the iterator interface to construct the uint64_t const* coords pointers at the time the Element<V> needs to be returned to the client. The main benefit of this approach is that it avoids any breaking changes to client code. The downside is that it might be tricky to construct the pointer when advancing from one element to the next. (I haven't looked that far down this CL to see if it actually would be tricky or not.)

(2) Update the Element<V> class to store all three of { SparseTensorCOO const& coo; const ElementId elemId; V value} and provide a method uint64_t const* coords() const which uses the coo to compute the correct pointer from the elemId. The main benefit of this approach is that it helps the Element<V> be more stable to mutations of the arrays stored in the coo. The downside is that it makes the struct larger.

wrengr added inline comments.Mar 28 2023, 5:30 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
209

This should use the ElementId type. (I'm pretty sure)

Since element identifiers are the thing that's stable under sorting etc, they're the thing that client code will store and pass around as the primary handle on an element. Therefore, this method should be named getValue since it's the primary method; whereas the method currently named that should be renamed to something else like getValueAtPosition or getNthValue.

210

Ditto my comment at getValueForId: the parameter should use the ElementId type, and this method should take the name getCoords.

214

To avoid confusion with the method currently named getValueForId, you should define a second typedef named something like ElementPosition or NthElement, and use that typedef for the argument. Also, you should use different variable names to help distinguish when something is supposed to be an ElementId vs an NthElement (e.g., using i for the former but n for the latter).

Since element identifiers are the thing that's stable under sorting etc, they're the thing that client code will store and pass around as the primary handle on an element. Therefore, this method should not take the name getValue since it's not the primary method client code will want to use. Instead this method should be renamed to something else like getValueAtPosition or getNthValue, so that the method currently named getValueForId can take the name getValue.

215

Ditto my comment at getValue: this method should instead be getNthCoords(NthElement n) or similar.

wrengr added inline comments.Mar 28 2023, 6:11 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
211

Why are you calling .data() before the operator[] instead of just using the operator[] directly on coordinates?

250

you should use ElementId elemId instead, since that's what the variable actually means in this context.

251

Nit: you can/should use elemId != 0 here instead, to avoid the cost of the additional method call.

256–262

As mentioned earlier, the iterator interface really does need to return Element<V> rather than ElementId. Since we no longer store a vector_type i.e. std::vector<Element<V>>, that means you'll need to define a new SparseTensorCOO<V>::Iterator class and use that in lieu of the vector_type typedef when defining the {iterator, const_iterator, difference_type, size_type} typedefs.

Defining the iterator class shouldn't be difficult, though there are a lot of fiddly details to making everything work right. Alas, unfortunately you can't use the llvm::iterator_facade_base base class like I do in D146691. So if you're not familiar with all the details of how to define an iterator from scratch, just let me know and I can define one in a separate CL.

277–279

delete this "COO"

277–279

"stores"

mlir/lib/ExecutionEngine/SparseTensorRuntime.cpp
94

Why do you need to store two different iterators?

105

I think it would be better to have this method return Element<V>. Once the SparseTensorCOO<V>::Iterator class is defined, then that class will handle everything that's needed, for constructing the new element. If that class is defined correctly then this method can still be implemented via (it < end ? &*it++ : nullptr) or some slight variation thereof.

In particular, the variation I have in mind is return it < end ? std::make_optional(*it++) : std::nullopt; since that handles the liveness issues associated with SparseTensorCOO<V>::Iterator::operator* constructing a new Element<V> to return. Since the SparseTensorIterator::getNext method is not exported directly but rather is only used within the CAPI macro definitions below, there's no problem with using std::option in lieu of const*

108

Should use pre-increment here, since that's what I use throughout the runtime library. The postincrement of the previous implementation was just so that we could squish everything into the succinct "&*it++"

114–119

It feels wrong to me for this class to need these methods. The redundancy of needing SparseTensorIterator::{getValue,getCoords} in addition to SparseTensorCOO::{getValue,getCoords} should belie the fact that the C++ iterator interface for SparseTensorCOO really ought to be yielding Element<V> rather than ElementId.

bixia edited the summary of this revision. (Show Details)Mar 29 2023, 7:41 AM
bixia updated this revision to Diff 509435.Mar 29 2023, 12:07 PM
bixia marked 25 inline comments as done.

Address review comments.

Nit: in the CL summary it should be "Previously, we represented a"

done.

mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
49–50

Add Element<V> back and use it for the iterator

59–61

Removed const for ElementId

63–65

Thanks! done.

111

Most of the typedefs in this code block aren't really needed anymore, mostly because the iterator is not a std::iterator, I added comment to the new iterator class to explain this.

111

Add a iterator class and use it to enumerate the COO elements.

113–114

See my comment above, this typedef isn't needed anymore.

209

Fixed the type, but didn't rename the function, per offline discussion

210

Fixed the type, but didn't rename the function, per offline discussion.

210

Similar to the above.

211

Fixed.

214

Added ElementPosition type and used it here and other places. Also use different parameter names.

256–262

Fixed this based on the new iterator class.

mlir/lib/ExecutionEngine/SparseTensorRuntime.cpp
94

These fields were modified to use the new iterator class.

105

Change to use std::optional<Element<V>>, which is different from std::iterator, but serves our purpose.

108

This code is removed.

114–119

This is replaced by returning Element<V>.

aartbik added inline comments.Mar 29 2023, 12:30 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
29–43

The only downside is good english, right?
Why did you change this to downsize?!

31

I find (ID+1)*rank a bit more intuitive, but that is subjective

116–164

An iterator?

119

constructs

bixia updated this revision to Diff 509442.Mar 29 2023, 12:31 PM

Replace typedef ElementId with a struct and simplify getValueForId/getCoordsforId with getValue/getCoords.

wrengr added inline comments.Mar 29 2023, 1:27 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
29

please add a comma here

29–30

"An element ID is used for retrieving the element's value and coordinates from the appropriate arrays."

29–43

"downside"

30

"pointer"

30

"underlying"

31

"an element"

33

"or" (and dropping the second "the")

46–47

"the"

111

Most of the typedefs in this code block aren't really needed anymore, mostly because the iterator is not a std::iterator, I added comment to the new iterator class to explain this.

This has nothing to do with std::iterator; these aliases are for the std::iterator_traits template, which is used ubiquitously throughout LLVM/MLIR in order to derive new iterators from old ones. (cf., https://en.cppreference.com/w/cpp/iterator/iterator_traits)

To see an example of why it's important to support that template, take a look at the definition of tensor_loop_id::Iterator in D146691 (line 207 at https://reviews.llvm.org/D146691?id=508266#change-rkcdqIP8Zcjv), and then take a look at the definition of llvm::iterator_adaptor_base and how it uses std::iterator_traits. Although this particular case is just a minor inconvenience, failing to provide the typedefs needed by std::iterator_traits can easily snowball into much bigger issues for client code. Therefore, usability dictates that the iterator class provides those typedefs.

116–164

"An iterator over the elements of the COO in the order given by..."

120

Should make this final.

Also, should use the class keyword since (1) all the data members are private, and (2) it provides a bunch of methods. I say that for stylistic reasons, but apparently on Windows the struct-vs-class keywords affect linkage (according to the -Wmismatched-tags warnings emitted by the phabricator buildbot on Debian).

121

I think this typedef should be named Impl instead, since it's the underlying implementation and that's the name used elsewhere for such things

122–123

I don't think this constructor should be public, since that can easily lead to bugs from clients passing some other arbitrary thing as the second argument.

Instead, you should have the iterator class declare SparseTensorCOO<V> to be a friend. However, there is some trickiness about the ordering of definitions vs declarations in order to get things to compile correctly. For a complete worked example, see the {TensorId, TensorId::Iterator, TensorId::Range} classes in D146693. Or the tldr is:

  1. Have SparseTensorCOO<V> forward declare the iterator class and the begin/end methods.
  2. After the closing "};" of the COO class, then you can define the iterator class.
  3. After the closing "};" of the iterator class, then you can define the SparseTensorCOO<V>::{begin,end} methods.
141

You should also provide == and != since those are what's used when desugaring "for (auto elem : coo)"

147

I think it'd be clearer to name this it or iter

wrengr added inline comments.Mar 29 2023, 3:04 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
31

I find that more intuitive as well. I'd go even further and suggest: [rank*ID .. rank*(ID+1))

wrengr added inline comments.Mar 29 2023, 3:21 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
28–44

Previously, this block of text was documentation for the Element<V> type, hence using the "///" so that the doxygen documentation tooling picks it up. Whereas the present CL changes that to a more general commentary on the whole file rather than something specifically attached to any one type. So I'd suggest adjusting this text in one of the following ways. (I'm not sure which is best, so you may want to take a look at the generated doxygen files and/or see what Aart's preference is)

  1. Leave the text here but change the "///" to a plain "//" comment instead, and add a blank line between the commentary and the ElementId definition.
    • This approach is best if we think these details are only relevant to developers of this library itself, and hence should not be shown to users of the library.
  2. Move the text to the whole-file documentation section (currently at lines 8–14) and keep the "///".
    • This approach is best if we think users of the library should see them, and we think they should see them before seeing the documentation for the various things defined in this file.
  3. Move the text to the documentation for the SparseTensorCOO class and keep the "///".
    • This approach is best if we think users of the library should see them, but we think they should only see them when looking at the documentation for the COO class.
    • If going with this approach, then this text should come after the stuff about the Container concept.
bixia updated this revision to Diff 509701.Mar 30 2023, 9:31 AM
bixia marked 20 inline comments as done.

Address review comments.

bixia added inline comments.Mar 30 2023, 9:37 AM
mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
28–44

Thanks! I took approach 3. and move it to document SparseTensorCOO.

29–43

downside

111

The iterator we defined here isn't a proper iterator that can have all the normal iterator traits, such as how would you define iter::pointer as this iterator doesn't actually return a pointer to the storage but constructs a temp object and returns it?
On the other hand, we only use this iterator to implement SparseTensorIterator::getNext and we don't need most of t he iterator traits to support this.

120

Add final and change to class.

121

Add Impl to name and also make it private.

122–123

Move the constructor to private and make the iterator declare SparseTensorCOO as a friend.

147

change to iter.

wrengr added inline comments.Mar 30 2023, 12:55 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
111

On the other hand, we only use this iterator to implement SparseTensorIterator::getNext and we don't need most of t he iterator traits to support this.

As I have already said several times: We only use this for implementing SparseTensorIterator::getNext, but this iterator interface is not for us. The C++ iterator interface was added because Peter Gavin specifically requested it. (Although it was something on my todo list for a long time.) The runtime library does not merely serve our own internal use. The runtime library was factored out into a standalone C++ library for the express purpose of being reused by other clients, including the SparseCore team. Because the library is intended to be used by external clients working in C++ rather than through the MLIR dialect, the library must therefore support the needs of those external clients in addition to our own internal needs.

By focusing on the getNext method you're putting the cart before the horse. The C++ iterator interface is the principal interface and should therefore take primacy in matters of design. The only reason the SparseTensorIterator class exists is as a stop-gap to paper over the fact that the MLIR bindings cannot use the C++ iterator directly. More specifically, the only purpose of the SparseTensorIterator class is to pair an iterator with its end-iterator so that _mlir_ciface_getNext##VNAME knows when the end has been reached. If we redesigned the MLIR bindings to avoid that need to pair iterators with their end-iterators, then we could remove the SparseTensorIterator class entirely.

bixia updated this revision to Diff 509826.Mar 30 2023, 2:49 PM
bixia marked an inline comment as done.

Add definitions for iterator trait.

mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
111

Add the using for iterator trait back per offline discussion.

wrengr added inline comments.Mar 30 2023, 3:44 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
30

"coordinates"

31

This class should be final

34–35

For hygiene, the id field shouldn't be public.

Making it private doesn't loose any functionality: since you have operator uint64_t for reading it out, and have the default copy-assignment operator for mutating it.

38

"`ElementId`"

47

This should be "A pointer into the..." (since it's one of several pointers, all of which pointing into a single shared-pool)

95–96

You should either: have the "(2)" start a new line; or, not have the "(1)" start a new line.

(I don't have a strong preference which; though since the "(1)" is so short and the "(2)" is so long, I have a slight preference towards not having them start new lines.)

110

I'm totally fine with this forward declaration, but since you're defining the Iterator class within the SparseTensorCOO<V> definition (i.e., rather than defining the iterator class after closing the definition of the COO class), it would be cleaner to give the Iterator definition first and then give the typedefs afterwards. (Unless there's some style guide thing saying to do typedefs before class definitions?)

111–117

👍

125–126

You should also provide the other typedefs required by https://en.cppreference.com/w/cpp/iterator/iterator_traits. I.e., using value_type = Element<V>; using reference = value_type &; using pointer = void; using iterator_category = std::forward_iterator_tag;.

(N.B., you're only allowed to elide the using pointer = void; definition in C++20; but since MLIR uses C++17, it's required)

128

Once you provide all the typedefs, you can change this to value_type (if you want).

129

Since the ctor isn't marked as explicit, you can change this to return {coo.getCoords(*iter), coo.getValue(*iter)}; (if you want).

157

What I meant was that the entire name should be just "Impl", since the rest of the name is redundant

217–224

Nit: overloads should be grouped together. I.e., the two getValue overloads together, followed by the two getCoords overloads together

aartbik resigned from this revision.Aug 22 2023, 3:25 PM