This is an archive of the discontinued LLVM Phabricator instance.

Introduce a specialized data structure to be used in a subsequent change
AbandonedPublic

Authored by sanjoy on Sep 29 2017, 3:19 PM.

Details

Reviewers
chandlerc
rsmith
Summary

ChunkedList will be used to store use lists in ScalarEvolution.

Event Timeline

sanjoy created this revision.Sep 29 2017, 3:19 PM

Looks a fair bit like std::deque - what're the tradeoffs between the two?

include/llvm/ADT/ChunkedList.h
100–102

Anything operator overload that can be a non-member generally should be, so conversions on LHS and RHS have matching handling. (can still be a friend which allows inline definition here, if that's desired/convenient)

sanjoy added a comment.Oct 9 2017, 5:10 PM

Looks a fair bit like std::deque - what're the tradeoffs between the two?

I don't think this is like a std::deque -- it allows insertion only in one direction. It is probably most similar to std::forward_list<std::array<T, N>>, and frankly I'm not sure if I've gone overboard with implementing a new data structure from scratch. However:

  • Most of the logic is in iteration and figuring out when to add a new chunk to the linked list. This would still remain with std::forward_list<std::array<T, N>>.
  • I'd be less comfortable playing the pointer-offsetting tricks I've used to keep the data structure two words long if I were using std::forward_list.
chandlerc edited edge metadata.Oct 10 2017, 10:31 AM

Have you considered building a ChunkedVector instead of a ChunkedList? Specifically, there is a great trick where you use a single index with the low bits being an index into the chunk and the high bits being an index into a vector of pointers. It has many of the benefits you list and is a bit simpler I think. It also supports essentially the entire vector API if desired. Both bi-directional and even random access are reasonably efficient. Good locality, etc.

The only downside I can see is that is is possible to implement insertion into the middle of ChunkedList with bounded complexity, but with ChunkedVector it would still be linear. However, you don't seem to have implemented arbitrary insertion and so I'm hoping that isn't necessary.

If you go the route of ChunkedVector, I'd also suggest thinking about what a useful SSO would look like... At the least an SmallVector for the pointers to chunks, but maybe also the ability to put the first chunk inline? Might make it too big in essentially all cases.

My guess is that SmallVector<std::array<T, 4096/sizeof(T)>, N> (for some small N) is going to be hard to beat.

include/llvm/ADT/ChunkedList.h
9–24

Move to a doxygen comment on the data structure?

62–65

swap?

74–80

The location of this comment doesn't makee it easy to find and looks like it pertains to a using declaration which doesn't make much sense.

I'd suggest:

  1. having a (doxygen) API comment
  2. sinking the implementation details to be attached to code in question. Either to the members or to the increment logic where this is implemented

(Haven't addressed the code comments yet since the design isn't settled)

Have you considered building a ChunkedVector instead of a ChunkedList? Specifically, there is a great trick where you use a single index with the low bits being an index into the chunk and the high bits being an index into a vector of pointers. It has many of the benefits you list and is a bit simpler I think. It also supports essentially the entire vector API if desired. Both bi-directional and even random access are reasonably efficient. Good locality, etc.

With a vector-of-buffers implementation, I'm a bit worried about the space overhead on the smaller cases. For instance, this is the histogram of how this data structure is populated from a clang-bootstrap (also in https://reviews.llvm.org/D38434):

     Count: 731310
       Min: 1
      Mean: 8.555150
50th %tile: 4
95th %tile: 25
99th %tile: 53
       Max: 433

If I used a vector-of-buffers, I will either have to recompute the capacity and end (of the last buffer) on every insert (which will require an additional deref and some computation) or have to keep two words in the data structure over the three that smallvector keeps anyway. This adds a lot of relative overhead on the median case (4 elements). In fact, the current situation of two extra words also qualifies as a "lot" of relative overhead IMO, and I want to think of an SSO to improve the situation.

The only downside I can see is that is is possible to implement insertion into the middle of ChunkedList with bounded complexity, but with ChunkedVector it would still be linear. However, you don't seem to have implemented arbitrary insertion and so I'm hoping that isn't necessary.

If you go the route of ChunkedVector, I'd also suggest thinking about what a useful SSO would look like... At the least an SmallVector for the pointers to chunks, but maybe also the ability to put the first chunk inline? Might make it too big in essentially all cases.

My guess is that SmallVector<std::array<T, 4096/sizeof(T)>, N> (for some small N) is going to be hard to beat.

Just to be clear, you mean SmallVector<std::array<T, 4096/sizeof(T)> *, N>, right? Btw, 4096 seems a bit high based on the histogram in https://reviews.llvm.org/D38434, but we can obviously lower it.

(Haven't addressed the code comments yet since the design isn't settled)

Have you considered building a ChunkedVector instead of a ChunkedList? Specifically, there is a great trick where you use a single index with the low bits being an index into the chunk and the high bits being an index into a vector of pointers. It has many of the benefits you list and is a bit simpler I think. It also supports essentially the entire vector API if desired. Both bi-directional and even random access are reasonably efficient. Good locality, etc.

With a vector-of-buffers implementation, I'm a bit worried about the space overhead on the smaller cases. For instance, this is the histogram of how this data structure is populated from a clang-bootstrap (also in https://reviews.llvm.org/D38434):

     Count: 731310
       Min: 1
      Mean: 8.555150
50th %tile: 4
95th %tile: 25
99th %tile: 53
       Max: 433

If I used a vector-of-buffers, I will either have to recompute the capacity and end (of the last buffer) on every insert (which will require an additional deref and some computation) or have to keep two words in the data structure over the three that smallvector keeps anyway. This adds a lot of relative overhead on the median case (4 elements). In fact, the current situation of two extra words also qualifies as a "lot" of relative overhead IMO, and I want to think of an SSO to improve the situation.

Hold on, the objects here are just pointers? Then none of this really makes sense to me...

Chunked data structures seem to make the most sense if moving the objects is really expensive and/or the objects are really large.

For pointers, why not just a vector?

sanjoy planned changes to this revision.Oct 12 2017, 10:29 PM

(Haven't addressed the code comments yet since the design isn't settled)

Have you considered building a ChunkedVector instead of a ChunkedList? Specifically, there is a great trick where you use a single index with the low bits being an index into the chunk and the high bits being an index into a vector of pointers. It has many of the benefits you list and is a bit simpler I think. It also supports essentially the entire vector API if desired. Both bi-directional and even random access are reasonably efficient. Good locality, etc.

With a vector-of-buffers implementation, I'm a bit worried about the space overhead on the smaller cases. For instance, this is the histogram of how this data structure is populated from a clang-bootstrap (also in https://reviews.llvm.org/D38434):

     Count: 731310
       Min: 1
      Mean: 8.555150
50th %tile: 4
95th %tile: 25
99th %tile: 53
       Max: 433

If I used a vector-of-buffers, I will either have to recompute the capacity and end (of the last buffer) on every insert (which will require an additional deref and some computation) or have to keep two words in the data structure over the three that smallvector keeps anyway. This adds a lot of relative overhead on the median case (4 elements). In fact, the current situation of two extra words also qualifies as a "lot" of relative overhead IMO, and I want to think of an SSO to improve the situation.

Hold on, the objects here are just pointers? Then none of this really makes sense to me...

Chunked data structures seem to make the most sense if moving the objects is really expensive and/or the objects are really large.

For pointers, why not just a vector?

I wanted to use a ChunkedList to avoid "slack" in the allocated memory since a vector can waste up to 1/2 the memory it currently has allocated. But I'll keep this complexity out of LLVM for now (as I've said above, I was already somewhat on the fence), we can pull this patch in later if needed.

sanjoy abandoned this revision.Jan 29 2022, 5:37 PM